Fix drop log #9768

jsosulski · 2021-09-22T09:17:22Z

Reference issue

What does this implement/fix?

Instead of adding 'IGNORED' only to empty tuples in the drop log, this will append it to all drop log entries when slicing an epoch object to a subset.

Additional information

I fixed one epoch unit test, which was straight forward, however the second one that fails is test_drop_epochs_mult where there is a section I don't really get:

mne-python/mne/tests/test_epochs.py

Lines 2367 to 2373 in 6c0006d

    
           for d1, d2 in zip(epochs1.drop_log, epochs2.drop_log): 
        
               if d1 == ('IGNORED',): 
        
                   assert (d2 == ('IGNORED',)) 
        
               if d1 != ('IGNORED',) and d1 != []: 
        
                   assert ((d2 == d1) or (d2 == ('IGNORED',))) 
        
               if d1 == []: 
        
                   assert (d2 == [])

There are several comparisons to empty lists [] which, as far as I understand the drop log correctly, can never be true/false as the entries can only be empty tuples, not empty lists. I could take a stab at fixing this unit test as well, once I understand it correctly.

Also currently I assume that a drop reason should be present only once in a drop log entry, however, one could argue that multiples are okay, to e.g. reconstruct the order of subset selections if for some reason someone would be interested in this.

OT: Probably too big of a change, but using 'UNSELECTED' or 'DESELECTED' as a drop reason when slicing epochs would make the drop log explain more precisely why the epochs were dropped. One could then add 'IGNORED' as a special case that incorporates 'UNSELECTED' and other common reasons, or just use an enum.

IGNORED tags

larsoner · 2021-09-22T15:18:10Z

There are several comparisons to empty lists [] which, as far as I understand the drop log correctly, can never be true/false as the entries can only be empty tuples, not empty lists.

Yes this looks like a bug to me

Also currently I assume that a drop reason should be present only once in a drop log entry

Agreed, I wouldn't bother appending more and more 'IGNORED's with each new __getitem__ call, which it sounds like would be the resulting effect...

OT: Probably too big of a change

Agreed we shouldn't go this route unless we really need to / have strong motivating use cases at some point.

larsoner · 2021-10-22T21:20:47Z

@jsosulski FYI we're planning on releasing 0.24 in just over a week. Do you have time to work on this in that time frame? If not, do you want to work on it afterward for the next release, or would you prefer someone else try to help / take over?

jsosulski · 2021-10-25T05:47:13Z

I will check today evening if I can fix the unit tests while still keeping their semantic logic the same as before. If not I will let you know.

jsosulski · 2021-10-26T06:00:57Z

I managed to fix the unit test for my case but broke several others by doing so. So either I will retry for a future release, or someone else can try it.

larsoner · 2021-10-26T12:22:44Z

mne/utils/mixin.py

+                select_all = list(range(len(inst.drop_log)))
+                for k in np.setdiff1d(select_all, key_selection):
+                    if reason not in drop_log[k]:
+                        drop_log[k] += (reason,)


I think this fix has an issue. Take for example, epochs1 = Epochs(raw, events, dict(a=1, b=2)) we could drop some events for event type a, e.g., due to MEG2663. Then imagine we do epochs1['a']. When this happens, this code will go over all indices that are not in selection, but we know that some of the ones not in selection were originally a type -- and these will end up having 'IGNORED' added to their drop log even though they shouldn't.

I'm not sure of a good way to fix this, but I'll push a test in a separate PR that exposes the problem in this one at least.

You are right. Maybe I could add a case differentiation depending on whether the selection was done by class/event or by index. In the latter case, this issue cannot appear as we cannot select dropped epochs by index.

Btw, am I missing any other epochs[idx] possibilities? I.e. idx can be: string -> select by one or multiple events or it can be: slice/int/range -> select by epoch index.
I just saw in the docs idx can be a pandas query, in which case I dont know if unselected epochs should get the 'IGNORED' tag or not.

m I missing any other epochs[idx] possibilities

For this first I would look at the doc

https://mne.tools/stable/generated/mne.Epochs.html#mne.Epochs.__getitem__

idx can be a pandas query, in which case I dont know if unselected epochs should get the 'IGNORED' tag or not.

... then for the real answer to all these dig into mne.utils.mixin.py:GetEpochsMixin._getitem

jsosulski · 2021-11-25T18:30:31Z

After some detachment of this issue, I think I was trying to misuse the droplog for my purposes. epo.selection should have all the info I need and I could subselect the droplog that is passed to _drop_log_stats.

Do you prefer I force push my changes (needed to rebase locally) to this PR, or close it and create a fresh one?

larsoner · 2021-11-25T19:07:47Z

Force push is better I think, keeps all discussion in one place

jsosulski · 2021-11-26T07:59:07Z

Nevermind, I cant figure out a way to make this work without re-structuring the drop_log / selection / events setup, because lets say you have first five epochs as target events, and the next five epoch as nontargets, and the last 2 of each class are bad:

epo  # 5 target epochs, 5 nontarget epochs
epo.drop_bad()  # 3 target epochs, 3 nontarget epochs
epo.selection  #  is now [0, 1 ,2, 5, 6, 7]
len(epo.events)  # is now 6
epo.drop_log  # is now ((), (), (), ('d'), ('d'), (), (), (), ('d'), ('d'))
epo_t = epo['target']  # 3 target epochs, 0 nontarget epochs
epo_t.selection  # is now [0, 1, 2]
# Current MNE Version:
epo_t.drop_log  # is now ((), (), (), ('d'), ('d'), ('IGNORED'), ('IGNORED'), ('IGNORED'), ('d'), ('d'))
# This PR as of now:
epo_t.drop_log  # is now ((), (), (), ('IGNORED','d'), ('IGNORED','d'), ('IGNORED'), ('IGNORED'), ('IGNORED'), ('IGNORED','d'), ('IGNORED','d'))

A workaround would be to call drop_bad() only after all subset selections have been done.

What I would need is to have the event-wise information for all ('d') drop log entries, but the events have been dropped as far as I can tell, and I can no longer conclude which of the drops belonged to which event. Or am I missing something?

The only solution I can think of is to have the original events array somewhere in the epochs object available. Without this information I think only having drop_log_stats() reset to 0 after each Epochs subselection is feasible.

larsoner · 2021-11-28T18:34:17Z

The only solution I can think of is to have the original events array somewhere in the epochs object available

For this specific use case could you make use of write_events to write a little companion -eve.fif or -eve.txt file alongside your epochs? This could be a suitable workaround for now, and if lots of people request this feature as well we could start writing the original events, too. (We try not to add too many things to objects, so it would be nice to see if doing so would benefit other use cases / users rather than adding it here/now.)

Without this information I think only having drop_log_stats() reset to 0 after each Epochs subselection is feasible.

I could see a epochs.reset_selection() or reset_drop_log being generally useful and not too hard to support. Would this be another different way to get the behavior/information you need? Basically it would make any given Epochs have epochs.selection = np.arange(len(epochs.events)) and epochs.drop_log = ((),) * len(epochs.events).

jsosulski · 2021-11-30T07:40:12Z

For this specific use case could you make use of write_events to write a little companion -eve.fif or -eve.txt file alongside your epochs?

Sure I can do that. I think I could also call _drop_log_stats directly, instead of the class method, where I pass my own drop log.

A reset_selection method would enable at least the behaviour where a subset selection resets the drop log stats to 0 and does not output spurious values. However, then #9726 would only be fixed when the user manually calls the reset method. Alternatively, I could make drop_log_stats emit a warning when drop_log entries outside the selection do not have 'IGNORED', e.g., any(ignore not in epochs.drop_log[~epochs.selection]).

larsoner · 2021-11-30T19:29:28Z

I think I could also call _drop_log_stats directly, instead of the class method, where I pass my own drop log.

It's not a good idea to use private functions, if this is the preferred solution we'd want to make it public for you.

I'm not sure what the best solution is here, what do you think allows us to make the smallest API changes / expansion while still enabling you to do what you need (and also preventing other users from being confused or making mistakes) @jsosulski ?

jsosulski · 2021-12-03T07:50:40Z

It is no urgent need for my use case, as I can just call drop after doing epoch subselection. Then the drop log stats work as intended.
To fix the very wrong drop log stats with few API changes, I would propose to use a fix like the one in this PR, as this changes the drop log stats to slightly wrong 0.

After thinking about it a bit more, I think the core issues is that epoch.selection and epoch.drop_log have overlapping responsibilities. Drop log is set during selection operations and selection is set during dropping operations.
If epoch.selection is only set by indexing operations, i.e., __getitem__, and epoch.drop_log by dropping operations, it would be straight forward to calculate the drop log stats correctly. And the current behaviour that epoch.selection is the 'ground truth' of selected epochs is simply obtained by having something (in pseudocode) like epoch.selection & ~epoch.drop_log

Then drop log stats could then simply use epoch.drop_log[epoch.selection]

Anyway this is a imo larger change, and to preserve backwards compatibility I would instead add, e.g., epoch.selected_epochs and remove the epoch.selection attribute and replace it with a @property that automatically returns epoch.selected_epochs & ~epoch.drop_log which would result in the current output of epoch.selection

larsoner · 2021-12-03T13:03:51Z

Anyway this is a imo larger change, and to preserve backwards compatibility I would instead add, e.g., epoch.selected_epochs and remove the epoch.selection attribute and replace it with a @Property that automatically returns epoch.selected_epochs & ~epoch.drop_log which would result in the current output of epoch.selection

I think there are many (years worth) epochs in the wild that have epochs.selection already, which we would have to figure out how to deal with. And we'd need to add a new, different constart to the FIF spec for this I think. So I agree it's a pretty big change. Not sure it's worth the effort if there are suitable workarounds.

To fix the very wrong drop log stats with few API changes, I would propose to use a fix like the one in this PR, as this changes the drop log stats to slightly wrong 0.

Okay, want to update the PR this way? I don't fully follow the case(s) that have changed, I'm hoping that the updated tests will make it clearer.

Another complementary option would be to add a short section in the tutorial (or example) docs about how to get the functionality and information you need using the current way of doing things (or way of doing things with the fix in this PR).

jsosulski added 2 commits September 21, 2021 18:14

FIX: Droplogstats now return 0 on epoch subsets

d172b23

adapt test_epochs_eq to adapt to new unselect

c8504b4

IGNORED tags

jsosulski mentioned this pull request Sep 22, 2021

Drop log handling in sliced epochs not intuitive #9726

Open

larsoner added this to the 0.24 milestone Oct 22, 2021

larsoner reviewed Oct 26, 2021

View reviewed changes

larsoner mentioned this pull request Oct 26, 2021

MRG, MAINT: Update epochs test #9903

Merged

larsoner modified the milestones: 0.24, 1.0 Oct 26, 2021

larsoner modified the milestones: 1.0, 1.1 Feb 11, 2022

larsoner modified the milestones: 1.1, 1.2 Jun 4, 2022

larsoner modified the milestones: 1.2, 1.3 Sep 13, 2022

larsoner removed this from the 1.3 milestone Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix drop log #9768

Fix drop log #9768

jsosulski commented Sep 22, 2021

larsoner commented Sep 22, 2021

larsoner commented Oct 22, 2021

jsosulski commented Oct 25, 2021

jsosulski commented Oct 26, 2021

larsoner Oct 26, 2021

jsosulski Oct 27, 2021

larsoner Oct 27, 2021

jsosulski commented Nov 25, 2021

larsoner commented Nov 25, 2021

jsosulski commented Nov 26, 2021

larsoner commented Nov 28, 2021

jsosulski commented Nov 30, 2021

larsoner commented Nov 30, 2021

jsosulski commented Dec 3, 2021

larsoner commented Dec 3, 2021

	for d1, d2 in zip(epochs1.drop_log, epochs2.drop_log):
	if d1 == ('IGNORED',):
	assert (d2 == ('IGNORED',))
	if d1 != ('IGNORED',) and d1 != []:
	assert ((d2 == d1) or (d2 == ('IGNORED',)))
	if d1 == []:
	assert (d2 == [])

Fix drop log #9768

Are you sure you want to change the base?

Fix drop log #9768

Conversation

jsosulski commented Sep 22, 2021

Reference issue

What does this implement/fix?

Additional information

larsoner commented Sep 22, 2021

larsoner commented Oct 22, 2021

jsosulski commented Oct 25, 2021

jsosulski commented Oct 26, 2021

larsoner Oct 26, 2021

Choose a reason for hiding this comment

jsosulski Oct 27, 2021

Choose a reason for hiding this comment

larsoner Oct 27, 2021

Choose a reason for hiding this comment

jsosulski commented Nov 25, 2021

larsoner commented Nov 25, 2021

jsosulski commented Nov 26, 2021

larsoner commented Nov 28, 2021

jsosulski commented Nov 30, 2021

larsoner commented Nov 30, 2021

jsosulski commented Dec 3, 2021

larsoner commented Dec 3, 2021