Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml serialization error #9

Closed
nsheff opened this issue Jul 1, 2016 · 4 comments
Closed

yaml serialization error #9

nsheff opened this issue Jul 1, 2016 · 4 comments

Comments

@nsheff
Copy link
Contributor

nsheff commented Jul 1, 2016

@cdietzgit and I have found this error for some annotation sheets:


Traceback (most recent call last):
  File "/home/sheffien/.local/bin/looper", line 9, in <module>
    load_entry_point('looper==0.3', 'console_scripts', 'looper')()
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/looper.py", line 409, in main
    sample.to_yaml()
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 686, in to_yaml
    serial = obj2dict(self)
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 670, in obj2dict
    return {k: obj2dict(v) for k, v in obj.__dict__.items() if (k not in to_skip)}
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 670, in <dictcomp>
    return {k: obj2dict(v) for k, v in obj.__dict__.items() if (k not in to_skip)}
  File "/home/sheffien/.local/lib/python2.7/site-packages/looper/models.py", line 672, in obj2dict
    return obj.item()
  File "/home/sheffien/.local/lib/python2.7/site-packages/pandas/core/base.py", line 827, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar
@nsheff
Copy link
Contributor Author

nsheff commented Jul 1, 2016

This was caused by a new sample-attribute I added called "sheet_attributes", which was a list of all the columns that were originally included in the annotation sheet.

I use this to spit out the output in summarize, so I don't include all the new attributes constructed by looper.

@afrendeiro can you think of a more robust way to handle yaml serialization than a hard-coded list of attributes to skip?

@afrendeiro
Copy link
Contributor

I'd be happy to have a generalizable way to serialize to yaml without hardcoding anything. However this is only done now to prevent infinite recursion since objects are tied to each other (e.g. a Project object is an attribute of a Sample and a Sample object is inside a list which is an attribute of a Project), so I don't think skiping specific attributes is the cause for this.

In that specfic case, it seems that it is a problem of determining the type of attribute (it thinks is a numpy data type). How's that list being added to the object? Through the annotation sheet or later after object creation? What are the elements of that list, only strings?

With a list as an element of a series, you still seem able to retrieve it correctly:

s = pd.Series([range(3)])  # series with a list as only element
s.dtype
>>> dtype('O')  # type object
s.item()
>>> [0, 1, 2]

Obviously, skiping that element fixes it for now but we still don't understand why it is failing, I guess I'd need to see that concrete case in particular.

@nsheff
Copy link
Contributor Author

nsheff commented Jul 2, 2016

@afrendeiro
Copy link
Contributor

afrendeiro commented Jul 3, 2016

Okay, that's not very good. What is being assigned to sheet_attributes is a RangeIndex object and not a list - and you can't serialize any arbitrary Python object to Yaml (you can to other formats obviously).

Pandas DataFrame columns and index (or just Series indexes) are objects of their own which have very useful methods and attributes. If you want them to be a list, you have to say so explicitly. The correct would be to use self.sheet_attributes = series.index.tolist():

series = pd.Series(range(3))
series.keys()  # gives object
>>> RangeIndex(start=0, stop=3, step=1)
series.index.tolist()  # gives list
>>> [0, 1, 2]

Calling columns or indexes with a method (keys()) is not very used too.
I won't commit the change since I cannot test this now, but this should now produce correct serialization. Feel free to make the change and remove sheet_attributes from the exclude list.

By the way, you might want to use those values to filter the sample attributes to be retrived in a function similar to as_series something like original_series, or maybe better into the SampleSheet object analogous to as_data_frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants