Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

”ValueError: need at least one array to concatenate“ when using remora model train on test data #197

Open
spoweekkk opened this issue Dec 3, 2024 · 8 comments

Comments

@spoweekkk
Copy link

When trying to use the test data and following the pipeline, I missed "Not enough chunks" error, and then I followed your advice showed previously using the command:

"remora model train train_dataset.jsn --model ~/gpfs1/Software/remora/models/ConvLSTM_w_ref.py --chunk-context 50 50 --output-path train_results --overwrite --num-test-chunks 200"

I got the error:
"[17:28:09.585] Seed selected is 711195172
[17:28:09.587] Loading dataset from Remora dataset config
[17:28:09.604] Dataset summary:
size : 415
kmer context bases : (4, 4)
chunk context : (50, 50)
reverse signal : False
chunk extract base start : False
chunk extract offset : 0
pa scaling : None
sig map refiner : Loaded 9-mer table with 7 central position. Rough re-scaling will be executed.
batches preloaded : False
is modbase dataset? : True
mod bases : ['m']
mod long names : ['5mC']
motifs : [('CG', 0)]

[17:28:09.605] Loading model
[17:28:09.613] Model structure:
network(
(sig_conv1): Conv1d(1, 4, kernel_size=(5,), stride=(1,))
(sig_bn1): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(sig_conv2): Conv1d(4, 16, kernel_size=(5,), stride=(1,))
(sig_bn2): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(sig_conv3): Conv1d(16, 64, kernel_size=(9,), stride=(3,))
(sig_bn3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(seq_conv1): Conv1d(36, 16, kernel_size=(5,), stride=(1,))
(seq_bn1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(seq_conv2): Conv1d(16, 64, kernel_size=(13,), stride=(3,))
(seq_bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(merge_conv1): Conv1d(128, 64, kernel_size=(5,), stride=(1,))
(merge_bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lstm1): LSTM(64, 64)
(lstm2): LSTM(64, 64)
(fc): Linear(in_features=64, out_features=2, bias=True)
(dropout): Dropout(p=0.3, inplace=False)
)
[17:28:09.617] Gradients will be clipped (by value) at 0.00 MADs above the median of the last 1000 gradient maximums.
[17:28:09.765] Params (k) 134.08 | MACs (M) 7327.45
[17:28:09.765] Preparing training settings
[17:28:09.766] Training optimizer and scheduler settings: TrainOpts(epochs=100, early_stopping=10, optimizer_str='AdamW', opt_kwargs=(('weight_decay', 0.0001, 'float'),), learning_rate=0.001, lr_scheduler_str='CosineAnnealingLR', lr_scheduler_kwargs=(('T_max', 100, 'int'), ('eta_min', 1e-06, 'float')), lr_cool_down_epochs=5, lr_cool_down_lr=1e-07)
[17:28:10.865] Dataset loaded with labels: control:205; 5mC:210
[17:28:10.865] Train labels: control:105; 5mC:110
[17:28:10.865] Held-out validation labels: control:0; 5mC:0
[17:28:10.865] Training set validation labels: control:0; 5mC:0
[17:28:10.865] Running initial validation
Batches: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/remora/bin/remora", line 8, in
sys.exit(run())
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/remora/lib/python3.8/site-packages/remora/main.py", line 71, in run
cmd_func(args)
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/remora/lib/python3.8/site-packages/remora/parsers.py", line 1377, in run_model_train
train_model(
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/remora/lib/python3.8/site-packages/remora/train_model.py", line 379, in train_model
val_metrics = val_fp.validate_model(
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/remora/lib/python3.8/site-packages/remora/validate.py", line 282, in validate_model
ms = self.run_validation(
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/remora/lib/python3.8/site-packages/remora/validate.py", line 247, in run_validation
all_outputs = np.concatenate(all_outputs, axis=0)
File "<array_function internals>", line 200, in concatenate
ValueError: need at least one array to concatenate"

Could you give me some advice on this error

@marcus1487
Copy link
Collaborator

It appears that the number of test chunks is indeed 0. It looks like there are enough chunks. Not sure why that would be. Could you try to extract a smaller number of test chunks, say 50, to see if this resolves the issue.

@spoweekkk
Copy link
Author

It appears that the number of test chunks is indeed 0. It looks like there are enough chunks. Not sure why that would be. Could you try to extract a smaller number of test chunks, say 50, to see if this resolves the issue.

I have tried to use smaller number of test chunks but it showed that the error remains. So strange

@spoweekkk
Copy link
Author

I seems that when adding "--read-batches-from-disk" to the command line, the test chunks is not 0 but the error remains as follow:
[04:50:18.074] Seed selected is 1214520150
[04:50:18.077] Loading dataset from Remora dataset config
[04:50:18.209] Dataset summary:
size : 1,296,769
kmer context bases : (4, 4)
chunk context : (200, 200)
reverse signal : False
chunk extract base start : False
chunk extract offset : 0
pa scaling : None
sig map refiner : Loaded 9-mer table with 7 central position. Rough re-scaling will be executed.
batches preloaded : False
is modbase dataset? : True
mod bases : ['a']
mod long names : ['aC']
motifs : [('C', 0)]

[04:50:18.211] Loading model
[04:50:18.234] Model structure:
network(
(sig_conv1): Conv1d(1, 4, kernel_size=(5,), stride=(1,))
(sig_bn1): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(sig_conv2): Conv1d(4, 16, kernel_size=(5,), stride=(1,))
(sig_bn2): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(sig_conv3): Conv1d(16, 64, kernel_size=(9,), stride=(3,))
(sig_bn3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(seq_conv1): Conv1d(36, 16, kernel_size=(5,), stride=(1,))
(seq_bn1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(seq_conv2): Conv1d(16, 64, kernel_size=(13,), stride=(3,))
(seq_bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(merge_conv1): Conv1d(128, 64, kernel_size=(5,), stride=(1,))
(merge_bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(lstm1): LSTM(64, 64)
(lstm2): LSTM(64, 64)
(fc): Linear(in_features=64, out_features=2, bias=True)
(dropout): Dropout(p=0.3, inplace=False)
)
[04:50:18.234] Gradients will be clipped (by value) at 0.00 MADs above the median of the last 1000 gradient maximums.
[04:50:18.889] Params (k) 134.08 | MACs (M) 36395.12
[04:50:18.889] Preparing training settings
[04:50:18.890] Training optimizer and scheduler settings: TrainOpts(epochs=100, early_stopping=10, optimizer_str='AdamW', opt_kwargs=(('weight_decay', 0.0001, 'float'),), learning_rate=0.001, lr_scheduler_str='CosineAnnealingLR', lr_scheduler_kwargs=(('T_max', 100, 'int'), ('eta_min', 1e-06, 'float')), lr_cool_down_epochs=5, lr_cool_down_lr=1e-07)
[04:50:19.789] Dataset loaded with labels: control:1,294,144; aC:2,625
[04:50:19.797] Train labels: control:1,293,944; aC:2,425
[04:50:19.797] Held-out validation labels: control:200; aC:200
[04:50:19.797] Training set validation labels: control:200; aC:200
[04:50:19.798] Running initial validation
Batches: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/nanopore/bin/remora", line 8, in
sys.exit(run())
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/nanopore/lib/python3.8/site-packages/remora/main.py", line 71, in run
cmd_func(args)
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/nanopore/lib/python3.8/site-packages/remora/parsers.py", line 1377, in run_model_train
train_model(
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/nanopore/lib/python3.8/site-packages/remora/train_model.py", line 379, in train_model
val_metrics = val_fp.validate_model(
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/nanopore/lib/python3.8/site-packages/remora/validate.py", line 282, in validate_model
ms = self.run_validation(
File "/lustre2/jdhan_pkuhpc/common/mamba/envs/nanopore/lib/python3.8/site-packages/remora/validate.py", line 247, in run_validation
all_outputs = np.concatenate(all_outputs, axis=0)
File "<array_function internals>", line 200, in concatenate
ValueError: need at least one array to concatenate

@spoweekkk
Copy link
Author

I found that when " --num-test-chunks " is used the error always exists, but if chunks is not enough, I have to add this parameter to the command line

@spoweekkk
Copy link
Author

I tried to set --num-test-chunks to 2500 it works, but if --num-test-chunks is less than 2000, the error will raise

@TKsh6
Copy link

TKsh6 commented Dec 31, 2024

I met the same error, but this happened after the first epoch,

(remora) syl@asus:~/5mc/04_ecoli$ remora model train \
> 04_remora_dataset/train_dataset.jsn \
> --read-batches-from-disk \
> --output-path 04_remora_dataset/remora_train_results \
> --overwrite \
> --model /mnt/raid/syl/software/remora/remora/models/ConvLSTM_w_ref.py \
> --seed 213 \
> --device 0 \
> --chunk-context 100 100 \
> --num-test-chunks 5000 \
> --batch-size 512 \
> --lr 0.00025
[15:42:46.661] Seed selected is 213
[15:42:46.680] Loading dataset from Remora dataset config
[15:42:46.757] Dataset summary:
                     size : 114,023
       kmer context bases : (4, 4)
            chunk context : (100, 100)
           reverse signal : False
 chunk extract base start : False
     chunk extract offset : 0
               pa scaling : None
          sig map refiner : Loaded 6-mer table with 3 central position. Rough re-scaling will be executed.
        batches preloaded : False
      is modbase dataset? : True
                mod bases : ['m']
           mod long names : ['5mC']
                   motifs : [('CG', 0)]

[15:42:46.757] Loading model
[15:42:46.764] Model structure:
network(
  (sig_conv1): Conv1d(1, 4, kernel_size=(5,), stride=(1,))
  (sig_bn1): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (sig_conv2): Conv1d(4, 16, kernel_size=(5,), stride=(1,))
  (sig_bn2): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (sig_conv3): Conv1d(16, 64, kernel_size=(9,), stride=(3,))
  (sig_bn3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (seq_conv1): Conv1d(36, 16, kernel_size=(5,), stride=(1,))
  (seq_bn1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (seq_conv2): Conv1d(16, 64, kernel_size=(13,), stride=(3,))
  (seq_bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (merge_conv1): Conv1d(128, 64, kernel_size=(5,), stride=(1,))
  (merge_bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (lstm1): LSTM(64, 64)
  (lstm2): LSTM(64, 64)
  (fc): Linear(in_features=64, out_features=2, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)
[15:42:46.764] Gradients will be clipped (by value) at 0.00 MADs above the median of the last 1000 gradient maximums.
[15:42:46.861] Params (k) 134.08 | MACs (M) 4299.17
[15:42:46.861] Preparing training settings
[15:42:47.072] Training optimizer and scheduler settings: TrainOpts(epochs=100, early_stopping=10, optimizer_str='AdamW', opt_kwargs=(('weight_decay', 0.0001, 'float'),), learning_rate=0.00025, lr_scheduler_str='CosineAnnealingLR', lr_scheduler_kwargs=(('T_max', 100, 'int'), ('eta_min', 1e-06, 'float')), lr_cool_down_epochs=5, lr_cool_down_lr=1e-07)
[15:42:47.554] Dataset loaded with labels: control:57,783; 5mC:56,240
[15:42:47.555] Train labels: control:55,283; 5mC:53,740
[15:42:47.555] Held-out validation labels: control:2,500; 5mC:2,500
[15:42:47.555] Training set validation labels: control:2,500; 5mC:2,500
[15:42:47.555] Running initial validation
Batches: 9it [00:00, 21.57it/s]
Batches: 9it [00:00, 56.70it/s]
[15:42:48.142] Start training
Epochs:   0%|                                                         | 0/100 [00:00<?, ?it/s, acc_train=0.5035, acc_val=0.5000, loss_train=0.694342, loss_val=0.694732
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19532/19532
  File "/mnt/raid/syl/.conda/envs/remora/bin/remora", line 8, in <module>
    sys.exit(run())
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/main.py", line 71, in run
    cmd_func(args)
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/parsers.py", line 1377, in run_model_train
    train_model(
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/train_model.py", line 515, in train_model
    val_metrics = val_fp.validate_model(
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/validate.py", line 282, in validate_model
    ms = self.run_validation(
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/validate.py", line 247, in run_validation
    all_outputs = np.concatenate(all_outputs, axis=0)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate
Epochs:   0%|                                                         | 0/100 [28:35<?, ?it/s, acc_train=0.5035, acc_val=0.5000, loss_train=0.694342, loss_val=0.694732]
Epoch Progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19532/19532

I tried to change --chunk-context and --num-test-chunks, but it didn't work, so how did you solve this problem? @marcus1487 @spoweekkk

@spoweekkk
Copy link
Author

I met the same error, but this happened after the first epoch,

(remora) syl@asus:~/5mc/04_ecoli$ remora model train \
> 04_remora_dataset/train_dataset.jsn \
> --read-batches-from-disk \
> --output-path 04_remora_dataset/remora_train_results \
> --overwrite \
> --model /mnt/raid/syl/software/remora/remora/models/ConvLSTM_w_ref.py \
> --seed 213 \
> --device 0 \
> --chunk-context 100 100 \
> --num-test-chunks 5000 \
> --batch-size 512 \
> --lr 0.00025
[15:42:46.661] Seed selected is 213
[15:42:46.680] Loading dataset from Remora dataset config
[15:42:46.757] Dataset summary:
                     size : 114,023
       kmer context bases : (4, 4)
            chunk context : (100, 100)
           reverse signal : False
 chunk extract base start : False
     chunk extract offset : 0
               pa scaling : None
          sig map refiner : Loaded 6-mer table with 3 central position. Rough re-scaling will be executed.
        batches preloaded : False
      is modbase dataset? : True
                mod bases : ['m']
           mod long names : ['5mC']
                   motifs : [('CG', 0)]

[15:42:46.757] Loading model
[15:42:46.764] Model structure:
network(
  (sig_conv1): Conv1d(1, 4, kernel_size=(5,), stride=(1,))
  (sig_bn1): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (sig_conv2): Conv1d(4, 16, kernel_size=(5,), stride=(1,))
  (sig_bn2): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (sig_conv3): Conv1d(16, 64, kernel_size=(9,), stride=(3,))
  (sig_bn3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (seq_conv1): Conv1d(36, 16, kernel_size=(5,), stride=(1,))
  (seq_bn1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (seq_conv2): Conv1d(16, 64, kernel_size=(13,), stride=(3,))
  (seq_bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (merge_conv1): Conv1d(128, 64, kernel_size=(5,), stride=(1,))
  (merge_bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (lstm1): LSTM(64, 64)
  (lstm2): LSTM(64, 64)
  (fc): Linear(in_features=64, out_features=2, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)
[15:42:46.764] Gradients will be clipped (by value) at 0.00 MADs above the median of the last 1000 gradient maximums.
[15:42:46.861] Params (k) 134.08 | MACs (M) 4299.17
[15:42:46.861] Preparing training settings
[15:42:47.072] Training optimizer and scheduler settings: TrainOpts(epochs=100, early_stopping=10, optimizer_str='AdamW', opt_kwargs=(('weight_decay', 0.0001, 'float'),), learning_rate=0.00025, lr_scheduler_str='CosineAnnealingLR', lr_scheduler_kwargs=(('T_max', 100, 'int'), ('eta_min', 1e-06, 'float')), lr_cool_down_epochs=5, lr_cool_down_lr=1e-07)
[15:42:47.554] Dataset loaded with labels: control:57,783; 5mC:56,240
[15:42:47.555] Train labels: control:55,283; 5mC:53,740
[15:42:47.555] Held-out validation labels: control:2,500; 5mC:2,500
[15:42:47.555] Training set validation labels: control:2,500; 5mC:2,500
[15:42:47.555] Running initial validation
Batches: 9it [00:00, 21.57it/s]
Batches: 9it [00:00, 56.70it/s]
[15:42:48.142] Start training
Epochs:   0%|                                                         | 0/100 [00:00<?, ?it/s, acc_train=0.5035, acc_val=0.5000, loss_train=0.694342, loss_val=0.694732
Traceback (most recent call last):██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19532/19532
  File "/mnt/raid/syl/.conda/envs/remora/bin/remora", line 8, in <module>
    sys.exit(run())
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/main.py", line 71, in run
    cmd_func(args)
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/parsers.py", line 1377, in run_model_train
    train_model(
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/train_model.py", line 515, in train_model
    val_metrics = val_fp.validate_model(
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/validate.py", line 282, in validate_model
    ms = self.run_validation(
  File "/mnt/raid/syl/.conda/envs/remora/lib/python3.8/site-packages/remora/validate.py", line 247, in run_validation
    all_outputs = np.concatenate(all_outputs, axis=0)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: need at least one array to concatenate
Epochs:   0%|                                                         | 0/100 [28:35<?, ?it/s, acc_train=0.5035, acc_val=0.5000, loss_train=0.694342, loss_val=0.694732]
Epoch Progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19532/19532

I tried to change --chunk-context and --num-test-chunks, but it didn't work, so how did you solve this problem? @marcus1487 @spoweekkk

I suppose this error is because of the device. I just set --num-test-chunks from 400 to 2100 and it work.

@marcus1487
Copy link
Collaborator

I am not able to reproduce this error internally. We are working on a rather large re-write of the dataset logic internally and should have this sorted out in the next release. I am guessing that this has to do with the size of the various core datasets and that the logic to extract a small subset of these small datasets is setting something to a size of 0. If you can provide your test core datasets here I could look into whether I can reproduce this error in the current release and confirm whether the internal branch fixes the issue. Alternatively I might be able to test with the results of the epoch summary (which should be in the output directory). This way I can at least test the sizes provided and see if there is some faulty logic in the remora dataset code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants