Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: edumanage/views/cat_user_api_proxy: 2to3 fix for CT handling #88

Merged
merged 2 commits into from
Mar 7, 2022

Conversation

vladimir-mencl-eresearch
Copy link
Collaborator

The code in cat_user_api_proxy changes Content-Type from text/html to application/json if it detects the payload returned is JSON - starts with '[' or '{'.

However, this code was missed in the Python3 overhaul.

In Python Requests, the type of response.content is bytes.

The existing comparison: r.content[0] in ['{', '['] works in Python2, where strings and bytes are not distinguished.

However, in Python3, r.content[0] returns just the value of the first byte (91), which does not match up with single-character unicode strings.

Fix this to work in both Python2 and Python3 by:

  • explicitly specifying the characters the value is compared to as bytes
  • using a 1 element slice of r.content instead of a single index to avoid getting the byte value directly

So the new check, compatible with both Python2 and Python3, is: r.content[0:1] in [b'{', b'[']

The code in cat_user_api_proxy changes Content-Type from text/html to application/json
if it detects the payload returned is JSON - starts with '[' or '{'.

However, this code was missed in the Python3 overhaul.

In Python Requests, the type of response.content is bytes.

The existing comparison: r.content[0] in ['{', '[']
works in Python2, where strings and bytes are not distinguished.

However, in Python3, r.content[0] returns just the value of the first byte (91),
which does not match up with single-character unicode strings.

Fix this to work in both Python2 and Python3 by:
* explicitly specifying the characters the value is compared to as bytes
* using a 1 element slice of r.content instead of a single index to avoid getting the byte value directly

So the new check, compatible with both Python2 and Python3, is: r.content[0:1] in [b'{', b'[']
@vladimir-mencl-eresearch
Copy link
Collaborator Author

Hi @zmousm , works all fine in my TEST environment - are you happy to merge it?

Cheers,
Vlad

Copy link
Contributor

@zmousm zmousm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vladimir-mencl-eresearch, thanks for spotting this.

Just to be accurate, for the record and for future reference, response.content was always a bytes object. What has changed in PY3 is the representation of bytes[index], which is an integer.

You can work around this by using the slice syntax, per your patch. For slightly (minimally) less awkward code, you can also use response.text, which returns unicode. In general this is obviously more risky, as it depends on requests receiving HTTP headers that allow for encoding detection. In the particular case, however, it is most probably not an issue and I guess it can be used safely. See inline for the respective suggestion.

Whatever you want, both solutions LGTM. You can merge it yourself or I can do it per your response.

@James-REANNZ
Copy link

After looking through the way requests handles encoding, I think using r.text is fairly safe. It first tries to detect the encoding based on the content, if that fails it falls back to assuming utf-8, replacing decoding errors with U+FFFD REPLACEMENT CHARACTER. This is more than good enough in this situation, as all the code is really doing is making a best effort at fixing up a mistake from the CAT API.

Use r.text as per review suggestion.

Co-authored-by: Zenon Mousmoulas <[email protected]>
@vladimir-mencl-eresearch
Copy link
Collaborator Author

Thanks @James-REANNZ !

Also given that the condition gets only evaluated when Content-Type is text/html, it makes sense to use r.text - I'll go with that.

Cheers,
Vlad

@zmousm zmousm merged commit 205b962 into grnet:master Mar 7, 2022
@vladimir-mencl-eresearch
Copy link
Collaborator Author

Just realised for Python2, using r.text would be inconsistent with the Requests API:

  • r.text returns unicode
  • in Python2, the string literals '[', '{' are byte strings.

But, Python2 string equality testing returns True for b'a' == u'a' ... and the condition still works.

Given Python2 is EOL for >2 years, and the test still works there, we can merge with r.text....

@vladimir-mencl-eresearch vladimir-mencl-eresearch deleted the fix/cat_api_content_type branch March 7, 2022 21:07
@zmousm
Copy link
Contributor

zmousm commented Mar 7, 2022

I think it's OK. You could also from __future__ import unicode_literals but it is not necessary just for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants