yaml serialization error #9

nsheff · 2016-07-01T15:34:36Z

@cdietzgit and I have found this error for some annotation sheets:


Traceback (most recent call last):
  File "/home/sheffien/.local/bin/looper", line 9, in <module>
    load_entry_point('looper==0.3', 'console_scripts', 'looper')()
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/looper.py", line 409, in main
    sample.to_yaml()
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 686, in to_yaml
    serial = obj2dict(self)
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 670, in obj2dict
    return {k: obj2dict(v) for k, v in obj.__dict__.items() if (k not in to_skip)}
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 670, in <dictcomp>
    return {k: obj2dict(v) for k, v in obj.__dict__.items() if (k not in to_skip)}
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 672, in obj2dict
    return obj.item()
  File "/home/sheffien/.local/lib/python2.7/site-packages/pandas/core/base.py", line 827, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

The text was updated successfully, but these errors were encountered:

nsheff · 2016-07-01T15:48:20Z

This was caused by a new sample-attribute I added called "sheet_attributes", which was a list of all the columns that were originally included in the annotation sheet.

I use this to spit out the output in summarize, so I don't include all the new attributes constructed by looper.

@afrendeiro can you think of a more robust way to handle yaml serialization than a hard-coded list of attributes to skip?

afrendeiro · 2016-07-02T09:29:54Z

I'd be happy to have a generalizable way to serialize to yaml without hardcoding anything. However this is only done now to prevent infinite recursion since objects are tied to each other (e.g. a Project object is an attribute of a Sample and a Sample object is inside a list which is an attribute of a Project), so I don't think skiping specific attributes is the cause for this.

In that specfic case, it seems that it is a problem of determining the type of attribute (it thinks is a numpy data type). How's that list being added to the object? Through the annotation sheet or later after object creation? What are the elements of that list, only strings?

With a list as an element of a series, you still seem able to retrieve it correctly:

s = pd.Series([range(3)])  # series with a list as only element
s.dtype
>>> dtype('O')  # type object
s.item()
>>> [0, 1, 2]

Obviously, skiping that element fixes it for now but we still don't understand why it is failing, I guess I'd need to see that concrete case in particular.

nsheff · 2016-07-02T14:44:59Z

It's at init, link right here: https://github.com/epigen/looper/blob/1e08758760b8a12e4240ac736df5132a8871d122/looper/models.py#L535

afrendeiro · 2016-07-03T10:28:14Z

Okay, that's not very good. What is being assigned to sheet_attributes is a RangeIndex object and not a list - and you can't serialize any arbitrary Python object to Yaml (you can to other formats obviously).

Pandas DataFrame columns and index (or just Series indexes) are objects of their own which have very useful methods and attributes. If you want them to be a list, you have to say so explicitly. The correct would be to use self.sheet_attributes = series.index.tolist():

series = pd.Series(range(3))
series.keys()  # gives object
>>> RangeIndex(start=0, stop=3, step=1)
series.index.tolist()  # gives list
>>> [0, 1, 2]

Calling columns or indexes with a method (keys()) is not very used too.
I won't commit the change since I cannot test this now, but this should now produce correct serialization. Feel free to make the change and remove sheet_attributes from the exclude list.

By the way, you might want to use those values to filter the sample attributes to be retrived in a function similar to as_series something like original_series, or maybe better into the SampleSheet object analogous to as_data_frame.

nsheff referenced this issue Jul 1, 2016

merge dev

73389c8

nsheff closed this as completed in 4b5114d Jul 1, 2016

nsheff added a commit that referenced this issue Jan 23, 2017

skip sheet attributes in yaml serialization. Fix #9

e6475d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yaml serialization error #9

yaml serialization error #9

nsheff commented Jul 1, 2016

nsheff commented Jul 1, 2016

afrendeiro commented Jul 2, 2016

nsheff commented Jul 2, 2016

afrendeiro commented Jul 3, 2016 •

edited

Loading

yaml serialization error #9

yaml serialization error #9

Comments

nsheff commented Jul 1, 2016

nsheff commented Jul 1, 2016

afrendeiro commented Jul 2, 2016

nsheff commented Jul 2, 2016

afrendeiro commented Jul 3, 2016 • edited Loading

afrendeiro commented Jul 3, 2016 •

edited

Loading