Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop log handling in sliced epochs not intuitive #9726

Open
jsosulski opened this issue Sep 10, 2021 · 7 comments · May be fixed by #9768
Open

Drop log handling in sliced epochs not intuitive #9726

jsosulski opened this issue Sep 10, 2021 · 7 comments · May be fixed by #9768
Labels

Comments

@jsosulski
Copy link
Contributor

Describe the bug

Using drop_log_stats() or drop_log is unintuitive when working with sliced epochs.

Steps to reproduce

MWE

import mne
import numpy as np

np.random.seed(10)
n_epochs, n_channels, n_timepoints = 10, 4, 21
ch_names = [f"ch_{i}" for i in range(n_channels)]
fake_data = 1e-6*np.random.random((n_epochs, n_channels, n_timepoints))

# Have the first three epochs exceed minmax in one channel
fake_data[0:3, 1, 0] = 200e-6

info = mne.create_info(ch_names=ch_names, sfreq=40)
epo = mne.EpochsArray(fake_data, info)
# This should drop the first three epochs
epo.drop_bad(reject=dict(misc=100e-6))

# Prints 30% as expected
print(f"I expect 30% and am: {epo.drop_log_stats()}%")

# select only the last epoch
subepos = epo[-1:]
# Prints 75%
print(f"I expect 0% or maybe 30% but am: {subepos.drop_log_stats()}")

# Drop log contains IGNORED only for events that were not dropped initially
print(subepos.drop_log)
# Output is:
# (('ch_1',), ('ch_1',), ('ch_1',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ('IGNORED',), ())

Expected results

I would either expect the drop log stats to know the initial ratio of drops, i.e. 30%, or after slicing report only the dropped epochs in the slice, i.e. 0%. However, calculating the latter is hard for non-trivial drop/slice combinations.

Actual results

MNE remembers how many epochs were dropped initially, then ignores the epochs marked as IGNORED and calculates the drop log stats to be 75%, as 3 were dropped and the slice contains 1 non-dropped epoch.

I suspect that when doing slices of epochs, the __getitem__ implementation only inserts ('IGNORED') in the drop log when there is no entry yet.

Additional information

Platform:      Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python:        3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0]
Executable:    /home/jan/dev/python/spot/spot_venv/bin/python
CPU:           x86_64: 8 cores
Memory:        Unavailable (requires "psutil" package)
mne:           0.24.dev0
numpy:         1.21.2 {blas=openblas, lapack=openblas}
scipy:         1.7.1
matplotlib:    3.4.3 {backend=module://backend_interagg}
sklearn:       0.24.2
numba:         Not found
nibabel:       3.2.1
nilearn:       Not found
dipy:          Not found
cupy:          Not found
pandas:        1.3.2
mayavi:        Not found
pyvista:       Not found
vtk:           Not found
@jsosulski jsosulski added the BUG label Sep 10, 2021
@agramfort
Copy link
Member

I agree. Here is some more details:

In [3]: epo[-1:].drop_log
Out[3]:
(('ch_1',),
 ('ch_1',),
 ('ch_1',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ('IGNORED',),
 ())

basically the first 3 do not take the tag ignored and so you have 4 epochs and 57% are dropped...

what would you suggest we do?

@jsosulski
Copy link
Contributor Author

I dont think there is a straightforward/satisfying solution.
As all bad epochs are already dropped, we can only slice-index on "good" epochs in our epochs object anyway. Hence slicing somehow must reset the drop log, i.e. also put 'IGNORED' into non-empty tuple-entries in the original drop_log, and the stats would show 0% dropped.

@jsosulski
Copy link
Contributor Author

After thinking about it, an acceptable solution could be to simply add 'IGNORED' to the drop log to the channel tuple when epochs were rejected, e.g. ( ('IGNORED', 'ch_1') ,... but I dont know if that breaks any other code that checks for exactly ('IGNORED',) and not whether IGNORED is contained.

@agramfort
Copy link
Member

agramfort commented Sep 20, 2021 via email

@jsosulski
Copy link
Contributor Author

When implementing this, there are two test failures in epochs.py caused by equalize_event_counts and direct comparisons with ('IGNORED',), but for now I cannot fix / understand the equalize_event_counts method in detail.

I could push a draft PR or just push the change in an actual PR, but the CI will then fail.

@larsoner
Copy link
Member

Feel free to push if you are looking for help/collaboration/want someone else to take a stab at fixing

The fix I would use is instead of comparing directly to (effectively) d == ('IGNORED',) for d in drop_log I would compare to 'IGNORED' in d for d in drop_log or so

@jsosulski jsosulski linked a pull request Sep 22, 2021 that will close this issue
@jsosulski
Copy link
Contributor Author

I pushed a draft PR #9768 . I have some issues with one unit test, which I think (maybe?) has a bug in it currently, but I am not sure and want to double check before changing a unit test.

@hoechenberger hoechenberger added this to the 1.0 milestone Oct 26, 2021
@larsoner larsoner modified the milestones: 1.0, 1.1 Feb 11, 2022
@larsoner larsoner modified the milestones: 1.1, 1.2 Jun 4, 2022
@larsoner larsoner modified the milestones: 1.2, 1.3 Sep 13, 2022
@larsoner larsoner removed this from the 1.3 milestone Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants