Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix drop log #9768

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Fix drop log #9768

wants to merge 2 commits into from

Conversation

jsosulski
Copy link
Contributor

Reference issue

Fixes #9726

What does this implement/fix?

Instead of adding 'IGNORED' only to empty tuples in the drop log, this will append it to all drop log entries when slicing an epoch object to a subset.

Additional information

I fixed one epoch unit test, which was straight forward, however the second one that fails is test_drop_epochs_mult where there is a section I don't really get:

for d1, d2 in zip(epochs1.drop_log, epochs2.drop_log):
if d1 == ('IGNORED',):
assert (d2 == ('IGNORED',))
if d1 != ('IGNORED',) and d1 != []:
assert ((d2 == d1) or (d2 == ('IGNORED',)))
if d1 == []:
assert (d2 == [])

There are several comparisons to empty lists [] which, as far as I understand the drop log correctly, can never be true/false as the entries can only be empty tuples, not empty lists. I could take a stab at fixing this unit test as well, once I understand it correctly.

Also currently I assume that a drop reason should be present only once in a drop log entry, however, one could argue that multiples are okay, to e.g. reconstruct the order of subset selections if for some reason someone would be interested in this.

OT: Probably too big of a change, but using 'UNSELECTED' or 'DESELECTED' as a drop reason when slicing epochs would make the drop log explain more precisely why the epochs were dropped. One could then add 'IGNORED' as a special case that incorporates 'UNSELECTED' and other common reasons, or just use an enum.

@larsoner
Copy link
Member

There are several comparisons to empty lists [] which, as far as I understand the drop log correctly, can never be true/false as the entries can only be empty tuples, not empty lists.

Yes this looks like a bug to me

Also currently I assume that a drop reason should be present only once in a drop log entry

Agreed, I wouldn't bother appending more and more 'IGNORED's with each new __getitem__ call, which it sounds like would be the resulting effect...

OT: Probably too big of a change

Agreed we shouldn't go this route unless we really need to / have strong motivating use cases at some point.

@larsoner
Copy link
Member

@jsosulski FYI we're planning on releasing 0.24 in just over a week. Do you have time to work on this in that time frame? If not, do you want to work on it afterward for the next release, or would you prefer someone else try to help / take over?

@larsoner larsoner added this to the 0.24 milestone Oct 22, 2021
@jsosulski
Copy link
Contributor Author

I will check today evening if I can fix the unit tests while still keeping their semantic logic the same as before. If not I will let you know.

@jsosulski
Copy link
Contributor Author

I managed to fix the unit test for my case but broke several others by doing so. So either I will retry for a future release, or someone else can try it.

Comment on lines +201 to +204
select_all = list(range(len(inst.drop_log)))
for k in np.setdiff1d(select_all, key_selection):
if reason not in drop_log[k]:
drop_log[k] += (reason,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fix has an issue. Take for example, epochs1 = Epochs(raw, events, dict(a=1, b=2)) we could drop some events for event type a, e.g., due to MEG2663. Then imagine we do epochs1['a']. When this happens, this code will go over all indices that are not in selection, but we know that some of the ones not in selection were originally a type -- and these will end up having 'IGNORED' added to their drop log even though they shouldn't.

I'm not sure of a good way to fix this, but I'll push a test in a separate PR that exposes the problem in this one at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Maybe I could add a case differentiation depending on whether the selection was done by class/event or by index. In the latter case, this issue cannot appear as we cannot select dropped epochs by index.

Btw, am I missing any other epochs[idx] possibilities? I.e. idx can be: string -> select by one or multiple events or it can be: slice/int/range -> select by epoch index.
I just saw in the docs idx can be a pandas query, in which case I dont know if unselected epochs should get the 'IGNORED' tag or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m I missing any other epochs[idx] possibilities

For this first I would look at the doc

https://mne.tools/stable/generated/mne.Epochs.html#mne.Epochs.__getitem__

idx can be a pandas query, in which case I dont know if unselected epochs should get the 'IGNORED' tag or not.

... then for the real answer to all these dig into mne.utils.mixin.py:GetEpochsMixin._getitem

@larsoner larsoner modified the milestones: 0.24, 1.0 Oct 26, 2021
@jsosulski
Copy link
Contributor Author

After some detachment of this issue, I think I was trying to misuse the droplog for my purposes. epo.selection should have all the info I need and I could subselect the droplog that is passed to _drop_log_stats.

Do you prefer I force push my changes (needed to rebase locally) to this PR, or close it and create a fresh one?

@larsoner
Copy link
Member

Force push is better I think, keeps all discussion in one place

@jsosulski
Copy link
Contributor Author

Nevermind, I cant figure out a way to make this work without re-structuring the drop_log / selection / events setup, because lets say you have first five epochs as target events, and the next five epoch as nontargets, and the last 2 of each class are bad:

epo  # 5 target epochs, 5 nontarget epochs
epo.drop_bad()  # 3 target epochs, 3 nontarget epochs
epo.selection  #  is now [0, 1 ,2, 5, 6, 7]
len(epo.events)  # is now 6
epo.drop_log  # is now ((), (), (), ('d'), ('d'), (), (), (), ('d'), ('d'))
epo_t = epo['target']  # 3 target epochs, 0 nontarget epochs
epo_t.selection  # is now [0, 1, 2]
# Current MNE Version:
epo_t.drop_log  # is now ((), (), (), ('d'), ('d'), ('IGNORED'), ('IGNORED'), ('IGNORED'), ('d'), ('d'))
# This PR as of now:
epo_t.drop_log  # is now ((), (), (), ('IGNORED','d'), ('IGNORED','d'), ('IGNORED'), ('IGNORED'), ('IGNORED'), ('IGNORED','d'), ('IGNORED','d'))

A workaround would be to call drop_bad() only after all subset selections have been done.

What I would need is to have the event-wise information for all ('d') drop log entries, but the events have been dropped as far as I can tell, and I can no longer conclude which of the drops belonged to which event. Or am I missing something?

The only solution I can think of is to have the original events array somewhere in the epochs object available. Without this information I think only having drop_log_stats() reset to 0 after each Epochs subselection is feasible.

@larsoner
Copy link
Member

The only solution I can think of is to have the original events array somewhere in the epochs object available

For this specific use case could you make use of write_events to write a little companion -eve.fif or -eve.txt file alongside your epochs? This could be a suitable workaround for now, and if lots of people request this feature as well we could start writing the original events, too. (We try not to add too many things to objects, so it would be nice to see if doing so would benefit other use cases / users rather than adding it here/now.)

Without this information I think only having drop_log_stats() reset to 0 after each Epochs subselection is feasible.

I could see a epochs.reset_selection() or reset_drop_log being generally useful and not too hard to support. Would this be another different way to get the behavior/information you need? Basically it would make any given Epochs have epochs.selection = np.arange(len(epochs.events)) and epochs.drop_log = ((),) * len(epochs.events).

@jsosulski
Copy link
Contributor Author

For this specific use case could you make use of write_events to write a little companion -eve.fif or -eve.txt file alongside your epochs?

Sure I can do that. I think I could also call _drop_log_stats directly, instead of the class method, where I pass my own drop log.

A reset_selection method would enable at least the behaviour where a subset selection resets the drop log stats to 0 and does not output spurious values. However, then #9726 would only be fixed when the user manually calls the reset method. Alternatively, I could make drop_log_stats emit a warning when drop_log entries outside the selection do not have 'IGNORED', e.g., any(ignore not in epochs.drop_log[~epochs.selection]).

@larsoner
Copy link
Member

I think I could also call _drop_log_stats directly, instead of the class method, where I pass my own drop log.

It's not a good idea to use private functions, if this is the preferred solution we'd want to make it public for you.

I'm not sure what the best solution is here, what do you think allows us to make the smallest API changes / expansion while still enabling you to do what you need (and also preventing other users from being confused or making mistakes) @jsosulski ?

@jsosulski
Copy link
Contributor Author

It is no urgent need for my use case, as I can just call drop after doing epoch subselection. Then the drop log stats work as intended.
To fix the very wrong drop log stats with few API changes, I would propose to use a fix like the one in this PR, as this changes the drop log stats to slightly wrong 0.

After thinking about it a bit more, I think the core issues is that epoch.selection and epoch.drop_log have overlapping responsibilities. Drop log is set during selection operations and selection is set during dropping operations.
If epoch.selection is only set by indexing operations, i.e., __getitem__, and epoch.drop_log by dropping operations, it would be straight forward to calculate the drop log stats correctly. And the current behaviour that epoch.selection is the 'ground truth' of selected epochs is simply obtained by having something (in pseudocode) like epoch.selection & ~epoch.drop_log

Then drop log stats could then simply use epoch.drop_log[epoch.selection]

Anyway this is a imo larger change, and to preserve backwards compatibility I would instead add, e.g., epoch.selected_epochs and remove the epoch.selection attribute and replace it with a @property that automatically returns epoch.selected_epochs & ~epoch.drop_log which would result in the current output of epoch.selection

@larsoner
Copy link
Member

larsoner commented Dec 3, 2021

Anyway this is a imo larger change, and to preserve backwards compatibility I would instead add, e.g., epoch.selected_epochs and remove the epoch.selection attribute and replace it with a @Property that automatically returns epoch.selected_epochs & ~epoch.drop_log which would result in the current output of epoch.selection

I think there are many (years worth) epochs in the wild that have epochs.selection already, which we would have to figure out how to deal with. And we'd need to add a new, different constart to the FIF spec for this I think. So I agree it's a pretty big change. Not sure it's worth the effort if there are suitable workarounds.

To fix the very wrong drop log stats with few API changes, I would propose to use a fix like the one in this PR, as this changes the drop log stats to slightly wrong 0.

Okay, want to update the PR this way? I don't fully follow the case(s) that have changed, I'm hoping that the updated tests will make it clearer.

Another complementary option would be to add a short section in the tutorial (or example) docs about how to get the functionality and information you need using the current way of doing things (or way of doing things with the fix in this PR).

@larsoner larsoner modified the milestones: 1.0, 1.1 Feb 11, 2022
@larsoner larsoner modified the milestones: 1.1, 1.2 Jun 4, 2022
@larsoner larsoner modified the milestones: 1.2, 1.3 Sep 13, 2022
@larsoner larsoner removed this from the 1.3 milestone Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop log handling in sliced epochs not intuitive
2 participants