Drop log handling in sliced epochs not intuitive #9726

jsosulski · 2021-09-10T09:32:42Z

Describe the bug

Using drop_log_stats() or drop_log is unintuitive when working with sliced epochs.

Steps to reproduce

MWE

import mne
import numpy as np

np.random.seed(10)
n_epochs, n_channels, n_timepoints = 10, 4, 21
ch_names = [f"ch_{i}" for i in range(n_channels)]
fake_data = 1e-6*np.random.random((n_epochs, n_channels, n_timepoints))

# Have the first three epochs exceed minmax in one channel
fake_data[0:3, 1, 0] = 200e-6

info = mne.create_info(ch_names=ch_names, sfreq=40)
epo = mne.EpochsArray(fake_data, info)
# This should drop the first three epochs
epo.drop_bad(reject=dict(misc=100e-6))

# Prints 30% as expected
print(f"I expect 30% and am: {epo.drop_log_stats()}%")

# select only the last epoch
subepos = epo[-1:]
# Prints 75%
print(f"I expect 0% or maybe 30% but am: {subepos.drop_log_stats()}")

# Drop log contains IGNORED only for events that were not dropped initially
print(subepos.drop_log)
# Output is:
# (('ch_1',), ('ch_1',), ('ch_1',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ())

Expected results

I would either expect the drop log stats to know the initial ratio of drops, i.e. 30%, or after slicing report only the dropped epochs in the slice, i.e. 0%. However, calculating the latter is hard for non-trivial drop/slice combinations.

Actual results

MNE remembers how many epochs were dropped initially, then ignores the epochs marked as IGNORED and calculates the drop log stats to be 75%, as 3 were dropped and the slice contains 1 non-dropped epoch.

I suspect that when doing slices of epochs, the __getitem__ implementation only inserts ('IGNORED') in the drop log when there is no entry yet.

Additional information

Platform:      Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python:        3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Executable:    /home/jan/dev/python/spot/spot_venv/bin/python
CPU:           x86_64: 8 cores
Memory:        Unavailable (requires "psutil" package)
mne:           0.24.dev0
numpy:         1.21.2 {blas=openblas, lapack=openblas}
scipy:         1.7.1
matplotlib:    3.4.3 {backend=module://backend_interagg}
sklearn:       0.24.2
numba:         Not found
nibabel:       3.2.1
nilearn:       Not found
dipy:          Not found
cupy:          Not found
pandas:        1.3.2
mayavi:        Not found
pyvista:       Not found
vtk:           Not found

The text was updated successfully, but these errors were encountered:

agramfort · 2021-09-10T13:08:59Z

I agree. Here is some more details:

In [3]: epo[-1:].drop_log
Out[3]:
(('ch_1',),
 ('ch_1',),
 ('ch_1',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ())

basically the first 3 do not take the tag ignored and so you have 4 epochs and 57% are dropped...

what would you suggest we do?

jsosulski · 2021-09-10T14:46:02Z

I dont think there is a straightforward/satisfying solution.
As all bad epochs are already dropped, we can only slice-index on "good" epochs in our epochs object anyway. Hence slicing somehow must reset the drop log, i.e. also put 'IGNORED' into non-empty tuple-entries in the original drop_log, and the stats would show 0% dropped.

jsosulski · 2021-09-20T07:24:31Z

After thinking about it, an acceptable solution could be to simply add 'IGNORED' to the drop log to the channel tuple when epochs were rejected, e.g. ( ('IGNORED', 'ch_1') ,... but I dont know if that breaks any other code that checks for exactly ('IGNORED',) and not whether IGNORED is contained.

agramfort · 2021-09-20T19:19:32Z

I like this idea. You can try to implement it and see if it breaks any test or tutorial. If not you should be safe.

…

jsosulski · 2021-09-21T16:51:55Z

When implementing this, there are two test failures in epochs.py caused by equalize_event_counts and direct comparisons with ('IGNORED',), but for now I cannot fix / understand the equalize_event_counts method in detail.

I could push a draft PR or just push the change in an actual PR, but the CI will then fail.

larsoner · 2021-09-21T16:54:54Z

Feel free to push if you are looking for help/collaboration/want someone else to take a stab at fixing

The fix I would use is instead of comparing directly to (effectively) d == ('IGNORED',) for d in drop_log I would compare to 'IGNORED' in d for d in drop_log or so

jsosulski · 2021-09-22T09:20:38Z

I pushed a draft PR #9768 . I have some issues with one unit test, which I think (maybe?) has a bug in it currently, but I am not sure and want to double check before changing a unit test.

jsosulski added the BUG label Sep 10, 2021

jsosulski linked a pull request Sep 22, 2021 that will close this issue

Fix drop log #9768

Draft

hoechenberger added this to the 1.0 milestone Oct 26, 2021

larsoner modified the milestones: 1.0, 1.1 Feb 11, 2022

larsoner modified the milestones: 1.1, 1.2 Jun 4, 2022

larsoner modified the milestones: 1.2, 1.3 Sep 13, 2022

larsoner removed this from the 1.3 milestone Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop log handling in sliced epochs not intuitive #9726

Drop log handling in sliced epochs not intuitive #9726

jsosulski commented Sep 10, 2021

agramfort commented Sep 10, 2021

jsosulski commented Sep 10, 2021

jsosulski commented Sep 20, 2021

agramfort commented Sep 20, 2021 via email

jsosulski commented Sep 21, 2021

larsoner commented Sep 21, 2021

jsosulski commented Sep 22, 2021

Drop log handling in sliced epochs not intuitive #9726

Drop log handling in sliced epochs not intuitive #9726

Comments

jsosulski commented Sep 10, 2021

Describe the bug

Steps to reproduce

Expected results

Actual results

Additional information

agramfort commented Sep 10, 2021

jsosulski commented Sep 10, 2021

jsosulski commented Sep 20, 2021

agramfort commented Sep 20, 2021 via email

jsosulski commented Sep 21, 2021

larsoner commented Sep 21, 2021

jsosulski commented Sep 22, 2021