Some corners about the training of topology generator #10

csyuhao · 2023-09-15T02:36:04Z

Nice work! Thank you very much for your contribution to the AI safety community!

I noticed a weird phenomenon when training the topology generator. The code of training the topology generator is

toponet.train()    
for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 
    optimizer_topo.zero_grad()
    # generate new adj_list by dr.data['adj_list']
    for gid in pset:
        SendtoCUDA(gid, [init_As, Ainputs, topomasks])    # only send the used graph items to cuda
        rst_bkdA = toponet(
            Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
        # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
        # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
        bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid])   # only current position in cuda
        SendtoCPU(gid, [init_As, Ainputs, topomasks])
        
    loss = forwarding(args, bkd_dr, model, allset, criterion)
    loss.backward()
    optimizer_topo.step()
    torch.cuda.empty_cache()
    
toponet.eval()

When I check the parameters of the topology generator before and after the training using the following snippets, i.e.,

import copy
old_toponet = copy.deepcopy(toponet)

toponet.train()    
for _ in tqdm(range(args.gtn_epochs), desc="training topology generator"): 
  optimizer_topo.zero_grad()
  # generate new adj_list by dr.data['adj_list']
  for gid in pset:
      SendtoCUDA(gid, [init_As, Ainputs, topomasks])    # only send the used graph items to cuda
      rst_bkdA = toponet(
          Ainputs[gid], topomasks[gid], topo_thrd, cuda, args.topo_activation, 'topo')
      # rst_bkdA = recover_mask(nodenums[gid], topomasks[gid], 'topo')
      # bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA, init_As[gid])
      bkd_dr.data['adj_list'][gid] = torch.add(rst_bkdA[:nodenums[gid], :nodenums[gid]], init_As[gid])   # only current position in cuda
      SendtoCPU(gid, [init_As, Ainputs, topomasks])
      
  loss = forwarding(args, bkd_dr, model, allset, criterion)
  loss.backward()
  optimizer_topo.step()
  torch.cuda.empty_cache()
toponet.eval()


new_toponet = copy.deepcopy(toponet)

old_state_dict = old_toponet.state_dict()
new_state_dict = new_toponet.state_dict()
for name in old_state_dict:
  param_diff = new_state_dict[name] - old_state_dict[name]
  print(torch.mean(param_diff))

I found there is no difference in parameters after training. The log is as follows:

N nodes avg/std/min/max:        15.69/13.69/2/95
N edges avg/std/min/max:        16.20/15.01/1/103
Node degree avg/std/min/max:    2.06/0.84/0/6
Node features dim:              4
N classes:                      2
Classes:                        [0 1]
Class 0:                        400 samples
Class 1:                        1600 samples

train 1000, test 1000
Train Epoch: 1  Loss: 0.3501 (avg: 0.6249)      sec/iter: 0.09
Train Epoch: 2  Loss: 0.4396 (avg: 0.4671)      sec/iter: 0.04
Train Epoch: 3  Loss: 0.3962 (avg: 0.4762)      sec/iter: 0.05
Train Epoch: 4  Loss: 0.2415 (avg: 0.4725)      sec/iter: 0.04
Train Epoch: 5  Loss: 0.3413 (avg: 0.4318)      sec/iter: 0.05
Test set (epoch 5): Average loss: 0.3149, Accuracy: 936/1000 (93.60%)   sec/iter: 0.04
Train Epoch: 6  Loss: 0.1591 (avg: 0.4509)      sec/iter: 0.05
Train Epoch: 7  Loss: 0.2189 (avg: 0.4338)      sec/iter: 0.05
Train Epoch: 8  Loss: 0.3262 (avg: 0.4374)      sec/iter: 0.05
Train Epoch: 9  Loss: 0.4319 (avg: 0.4283)      sec/iter: 0.05
Train Epoch: 10 Loss: 0.2932 (avg: 0.4221)      sec/iter: 0.04
Test set (epoch 10): Average loss: 0.2969, Accuracy: 949/1000 (94.90%)  sec/iter: 0.04
Train Epoch: 11 Loss: 0.3764 (avg: 0.4185)      sec/iter: 0.04
Train Epoch: 12 Loss: 0.3095 (avg: 0.4208)      sec/iter: 0.05
Train Epoch: 13 Loss: 0.2180 (avg: 0.3867)      sec/iter: 0.04
Train Epoch: 14 Loss: 0.3225 (avg: 0.3997)      sec/iter: 0.05
Train Epoch: 15 Loss: 0.2932 (avg: 0.4269)      sec/iter: 0.04
Test set (epoch 15): Average loss: 0.2962, Accuracy: 953/1000 (95.30%)  sec/iter: 0.04
Train Epoch: 16 Loss: 0.2085 (avg: 0.3804)      sec/iter: 0.04
Train Epoch: 17 Loss: 0.3577 (avg: 0.4243)      sec/iter: 0.04
Train Epoch: 18 Loss: 0.2417 (avg: 0.3843)      sec/iter: 0.04
Train Epoch: 19 Loss: 0.2875 (avg: 0.3822)      sec/iter: 0.04
Train Epoch: 20 Loss: 0.2741 (avg: 0.3581)      sec/iter: 0.05
Test set (epoch 20): Average loss: 0.2789, Accuracy: 955/1000 (95.50%)  sec/iter: 0.04
initializing trigger...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 4332.37it/s]
initializing trigger...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 19990.96it/s]
Resampling step 0, bi-level optimization step 0
training topology generator: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:09<00:00,  6.46s/it]
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')
tensor(0., device='cuda:0')

Could you give me some suggestions about this problem? Thank you very much for any replies! :)

The text was updated successfully, but these errors were encountered:

jdowner212 · 2023-12-19T23:02:28Z

I noticed the same issue -- toponet does not update between epochs. Also hoping for suggestions on this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some corners about the training of topology generator #10

Some corners about the training of topology generator #10

csyuhao commented Sep 15, 2023 •

edited

Loading

jdowner212 commented Dec 19, 2023

Some corners about the training of topology generator #10

Some corners about the training of topology generator #10

Comments

csyuhao commented Sep 15, 2023 • edited Loading

jdowner212 commented Dec 19, 2023

csyuhao commented Sep 15, 2023 •

edited

Loading