Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some corners about the training of topology generator #10

Open
csyuhao opened this issue Sep 15, 2023 · 1 comment
Open

Some corners about the training of topology generator #10

csyuhao opened this issue Sep 15, 2023 · 1 comment

Comments

@csyuhao
Copy link

csyuhao commented Sep 15, 2023

Nice work! Thank you very much for your contribution to the AI safety community!

I noticed a weird phenomenon when training the topology generator. The code of training the topology generator is

toponet.train()    
for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 
    optimizer_topo.zero_grad()
    # generate new adj_list by dr.data['adj_list']
    for gid in pset:
        SendtoCUDA(gid, [init_As, Ainputs, topomasks])    # only send the used graph items to cuda
        rst_bkdA = toponet(
            Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
        # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
        # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
        bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid])   # only current position in cuda
        SendtoCPU(gid, [init_As, Ainputs, topomasks])
        
    loss = forwarding(args, bkd_dr, model, allset, criterion)
    loss.backward()
    optimizer_topo.step()
    torch.cuda.empty_cache()
    
toponet.eval()

When I check the parameters of the topology generator before and after the training using the following snippets, i.e.,

import copy
old_toponet = copy.deepcopy(toponet)

toponet.train()    
for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 
  optimizer_topo.zero_grad()
  # generate new adj_list by dr.data['adj_list']
  for gid in pset:
      SendtoCUDA(gid, [init_As, Ainputs, topomasks])    # only send the used graph items to cuda
      rst_bkdA = toponet(
          Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
      # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
      # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
      bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid])   # only current position in cuda
      SendtoCPU(gid, [init_As, Ainputs, topomasks])
      
  loss = forwarding(args, bkd_dr, model, allset, criterion)
  loss.backward()
  optimizer_topo.step()
  torch.cuda.empty_cache()
toponet.eval()


new_toponet = copy.deepcopy(toponet)

old_state_dict = old_toponet.state_dict()
new_state_dict = new_toponet.state_dict()
for name in old_state_dict:
  param_diff = new_state_dict[name] - old_state_dict[name]
  print(torch.mean(param_diff))

I found there is no difference in parameters after training. The log is as follows:

N nodes avg/std/min/max:        15.69/13.69/2/95
N edges avg/std/min/max:        16.20/15.01/1/103
Node degree avg/std/min/max:    2.06/0.84/0/6
Node features dim:              4
N classes:                      2
Classes:                        [0 1]
Class 0:                        400 samples
Class 1:                        1600 samples

train 1000, test 1000
Train Epoch: 1  Loss: 0.3501 (avg: 0.6249)      sec/iter: 0.09
Train Epoch: 2  Loss: 0.4396 (avg: 0.4671)      sec/iter: 0.04
Train Epoch: 3  Loss: 0.3962 (avg: 0.4762)      sec/iter: 0.05
Train Epoch: 4  Loss: 0.2415 (avg: 0.4725)      sec/iter: 0.04
Train Epoch: 5  Loss: 0.3413 (avg: 0.4318)      sec/iter: 0.05
Test set (epoch 5): Average loss: 0.3149, Accuracy: 936/1000 (93.60%)   sec/iter: 0.04
Train Epoch: 6  Loss: 0.1591 (avg: 0.4509)      sec/iter: 0.05
Train Epoch: 7  Loss: 0.2189 (avg: 0.4338)      sec/iter: 0.05
Train Epoch: 8  Loss: 0.3262 (avg: 0.4374)      sec/iter: 0.05
Train Epoch: 9  Loss: 0.4319 (avg: 0.4283)      sec/iter: 0.05
Train Epoch: 10 Loss: 0.2932 (avg: 0.4221)      sec/iter: 0.04
Test set (epoch 10): Average loss: 0.2969, Accuracy: 949/1000 (94.90%)  sec/iter: 0.04
Train Epoch: 11 Loss: 0.3764 (avg: 0.4185)      sec/iter: 0.04
Train Epoch: 12 Loss: 0.3095 (avg: 0.4208)      sec/iter: 0.05
Train Epoch: 13 Loss: 0.2180 (avg: 0.3867)      sec/iter: 0.04
Train Epoch: 14 Loss: 0.3225 (avg: 0.3997)      sec/iter: 0.05
Train Epoch: 15 Loss: 0.2932 (avg: 0.4269)      sec/iter: 0.04
Test set (epoch 15): Average loss: 0.2962, Accuracy: 953/1000 (95.30%)  sec/iter: 0.04
Train Epoch: 16 Loss: 0.2085 (avg: 0.3804)      sec/iter: 0.04
Train Epoch: 17 Loss: 0.3577 (avg: 0.4243)      sec/iter: 0.04
Train Epoch: 18 Loss: 0.2417 (avg: 0.3843)      sec/iter: 0.04
Train Epoch: 19 Loss: 0.2875 (avg: 0.3822)      sec/iter: 0.04
Train Epoch: 20 Loss: 0.2741 (avg: 0.3581)      sec/iter: 0.05
Test set (epoch 20): Average loss: 0.2789, Accuracy: 955/1000 (95.50%)  sec/iter: 0.04
initializing trigger...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 4332.37it/s]
initializing trigger...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 19990.96it/s]
Resampling step 0, bi-level optimization step 0
training topology generator: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:09<00:00,  6.46s/it]
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')

Could you give me some suggestions about this problem? Thank you very much for any replies! :)

@jdowner212
Copy link

I noticed the same issue -- toponet does not update between epochs. Also hoping for suggestions on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants