This is my understanding of what Emma's doing. Emma's youtube video helps.
We use graph learning to find the most representative memory (collaboration pattern). And the agent we use here is prepopulated with that memory. See this jupyter notebook for more information.
- python3.8
- pip install matrx==2.3.0
- pip install typedb-client
- sudo apt install typedb-bin=2.18.0 typedb-server=2.18.0 typedb-console=2.18.0
- Download https://cloudsmith.io/~typedb/repos/public-release/packages/detail/deb/typedb-studio/2.18.0-1/a=amd64;xc=main;d=any-distro%252Fany-version;t=binary/
- First run server
typedb server
at 0.0.0.0:1729 nohup sh -c 'GDK_SCALE=2 /opt/typedb-studio/bin/TypeDB\ Studio' &
helps when you want to zoom in.- Create database
CP_ontology
- Write
CP_ontology_schema_corrected.tql
as schema - Write
Basic_instances.tql
as data
- Which scenario do you want to start with?
- Choose numbers from 0 to 8.
- 0 is the dummy round. 1 to 4 is the ones without brown rock. 5 to 8 is the ones with
- Which participant number do you want to use?
- This is the port number opeend at your localhost
- Open up the web browswer with the port number:
Use the arrow keys to move, B to select, and N to drop.
Collaborations patterns (CPs) are learned by contextual bandits (CBs). Basic behavrior is learned by by RL (Q-learning).
Emma's original state representations were a vector whose length is 18, where every
element was one-hot value. The size of the state space can be then
That's why now she has abstract state representations, which is the product of three sets (progress, contribution, human-standing), whose size are 4, 3, and 2, respectively. Now the abstract state space size is $432 = 24$.
-
States are represented as a tuple [progress, contribution, human-standing]
- progress: 1, 2, 3, 4
- relative progress of the number of rocks being removed
- contribution: equal, human, robot
- human-standing: true, false
- progress: 1, 2, 3, 4
-
basic behavior (full MDP)
- There are five actions for the agent: (1) move back and forth; (2) stand stil; (3)
pick up; and (4) break, (5) drop
- These are "macro" actions.
- These actions are not always the same. For example, "pick up" will apply differently by what objects are present in a given state.
- They are full MDP, which includes p(s'| s, a) and R(s, a).
- There are five actions for the agent: (1) move back and forth; (2) stand stil; (3)
pick up; and (4) break, (5) drop
From the CB point of view, the state space is still the one whose size is 24, and the
action space is all the possible CPs. One CP is a serious of sub-actions. From the CB of
view, one CP is one action. But there are some rules involved here. For example, at
The reward functions for the RL and CB are almost the same, except that idle time is handeled differently.
RL and CB are learned from scratch for both phase 1 and phase 2, although in phase 2, we can take advantage of the CPs collected. Emma thinks that there won't be so much learning happening in CB, since CB is only learned when there is more than one CP applicable in a given state / starting conditions (It has to decide which actions to choose). This is not so likely since the users will write fine-grained CPs.
This experiment differs from Emma's first co-learning experiments (Becoming Team
Members: Identifying Interaction Patterns of Mutual Adaptation for Human-Robot
Co-Learning):
- State / action spaces are different.
- The first experiment had 4 states (also called phases back then). It also had 3 "macro" actions, which are series of actions.
- Now we have 24 states and 5 actions.
- There were also two levels back then. Now Every participant will go through a total of 8 rounds of playing the task. Every round will have a different scenario, but the scenarios are grouped in two types of scenarios; one in which breaking rocks can have severe negative effects, and one in which there is a brown rock that cannot be picked up. Before starting the experiment, participants will have the opportunity to practice the task in a simple scenario without the robot.
In the phase 2, the agent can take advantage of the CPs that were collected in the phase
- What CP should the agent take given a state?
This is the only vocabulary used. "Robot", "Human", and "Victim" can only be used to
describe situations where they work as objects (yellow). They can't be used as part of
actions. What's also interesting is to see the funciton def translate_action
, which
translates user-specified actions to actual agent-executable actions. This was necessary
since the users can give ambiguious actions.
- I guess for each particpant, in one scenario (4 rounds), the machine will check the
applicable CPs, before it executes a CP, right? And as they near the end of 4th round,
there will be more CPs saved?
- yes
- Are the CPs directly executable? They can be ambiguous.
- They are, cuz they are hard-coded. They are run row by row. The machine waits for the human to do its action before moving on to the next row.
- How do you map the collected data from GUI to the states and actions for the
contextual bandit?
- CB is done when there is more than on applicable CPs execute. I think this means that the state space for the CB is not the same as that of RL.
- You mentioned that the participant will get a "prompt" to describe a CP. How many of
these do they get?
- Prompts will be sent out by the end of a scenario.
- In Phase 2, what should be the strategy of choosing one out of multiple applicable CPs? UCB? memory?
I'm gonna make some adjustments to the existing typedb ontology.
-
location
- top of rock pile
- above rock pile
- bottom of rock pile
- right / left side of rock pile
- right / left side of field
- on top of (object)
-
object
- large rock
- small rock
- brown rock
-
actor
- robot
- human
- victim
-
action
- move to (object)
- move back and forth in (location)
- stand still in (location)
- pick up (object) in (location)
- drop (object) in (location)
- break (object) in (location)
-
entities
- actor
- robot
- human
- victim
- object
- large rock
- small rock
- brown rock
- rock pile
- field
- actor action
- move back and forth
- stand still
- actor
-
relations
- subclass of
- superclass of
- top of
- above
- bottom of
- right side of
- left side of
- on top of
- move to
- has state
- pick up
- drop
- break
This modified verion kinda simplifies things a bit, but then some restrictions follow. Let's go through them one by one. Btw, "in" is not used anymore, since it's no longer necessary.
- entities
- actor: superclass of "robot", "human", and "victim"
- robot: subclass of "actor".
- human: subclass of "actor".
- victim: subclass of "actor".
- object: superclass of "large rock", "small rock", "brown rock", "rock pile", and "field"
- large rock: subclass of "object".
- small rock: subclass of "object".
- brown rock: subclass of "object".
- rock pile: subclass of "object". This entity can only be a tail, and the relation that follows this has to be "top of", "above", "bottom of", "left side of", or "right side of".
- field: subclass of object. This entity can only be a tail, and the relation that follows this has to be "left side of" or "right side of".
- move back and forth: subclass of "actor action". This entity can only be a tail, and the relation that follows has to be "has state"
- stand still: subclass of "actor action". Thsi entity can only be a tail, and the relation that follows has to be "has state"
- relations
- subclass of: This relation describes the relationships between the above mentioned entities
- superclass of: This relation describes the relationships between the above mentioned entities
- top of: The tail of this relation has to be "rock pile".
- above: The tail of this relation has to be "rock pile".
- bottom of: The tail of this relation has to be "rock pile".
- right side of: The tail of this relation has to be "rock pile" or "field"
- left side of: The tail of this relation has to be "rock pile" or "field".
- on top of: The tail of this relation has to be "large rock", "small rock", or "brown rock".
- move to: The head of this relation has to be "robot" or "human". The tail of this relation has to be "large rock", "small rock", or "brown rock".
- has state: The head of this relation has to be "robot" or "human". The tail of has to be "move back and forth" or "stand still".
- pick up: The head of this relation has to be "robot" or "human". The tail of this relation has to be "large rock", "small rock", or "brown rock".
- drop: The head of this relation has to be "robot" or "human". The tail of this relation has to be "large rock", "small rock", or "brown rock".
- break: The head of this relation has to be "robot" or "human". The tail of this relation has to be "large rock", "small rock", or "brown rock".
I'll use ./user-raw-data/new
-
all_cp_messages.csv
- This shows when the participants have added / edited / deleted CPs.
- I have to parse the html
- Perhaps the latest CP makes most sense, but it's up to you to decide.
-
cp_execution.csv
- The column to the right of the CP (string value) is the amount of ticks that it lasted. The maximum is 3020, which is about 150 seconds.
-
"False"
is not CP but the basic behavior, cuz there is no matching CP to run.
-
data_aggregate_complete.csv
-
"Time_score"
is the objective metric. The lower it is, the better it is.$$Time_score = corrected_tick + 100 * remaining_rocks$$
- As for the
Condition
,"C3"
is what matters to me since this is the GUI with the CP.
-
typedb server
typedb console
exit
database list
database delete CP_ontology
database create CP_ontology
transaction CP_ontology schema write
and then copy to the console
and then commit
transaction CP_ontology schema read
match $x sub thing; get $x;
transaction CP_ontology data write
and then copy to the console
and then commit
transaction CP_ontology data read
match $x isa thing; get $x;
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Run
make test && make style && make quality
in the root repo directory, to ensure code quality. - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request