-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvarchar character encoding in nettezza db #126
Comments
The |
Already done ... The encoding is UTF-8. I am not using chinese character but this is what I see as output |
What are the values returned for this query if you use nzsql directly? This is likely due to a mismatch in the unicode width between the driver and the database. You may have to set the See also https://stackoverflow.com/questions/21536059/python-pyodbc-connections-to-ibm-netezza-erroring, which are some answers for a similar problem with pyodbc. |
Thanks to Jim for his support nzsql works fine My odbc.ini is set up as described in the first link I am aware of the cast work around and it works fine, But, when moving forward to testing the Netezza-dplyr interface this is not enough :-( |
Did you try using |
Few months back we encountered the same problem. I opened an RStudio support case for this (21266) but we didn't find a solution. For now we are also casting the values. Which does not scale to many users and many queries... |
Changing the value for UnicodeTranslationOption does not seem to have any effect. |
It looks like UnicodeTranslationOption only relates to 'normal' varchar fields and not nvarchar. |
In conclusion, it seems that nvarchar, with the netezza driver and the odbc package, are not shown correctly. |
We just upgraded our server to RStudio 1.1. Not sure if it is related, but we have the same effect of special characters for the database, schema and table names in the IDE Connections pane for Netezza connections. |
Another datapoint: Netzza connectivity works perfectly with
|
I see this issue got a label reprex. |
Unfortunately I have no C++ or R package development experience and can not investigate this problem.
When we change the encoding parameter to UTF-16, the first field that was correctly decoded earlier now has the same error as the second field. Note that the 6 bytes from the string gets decoded as 3 characters (which makes sense for UTF-16).
Note that this is with the IBM ODBC drivers. I believe there are also Simba Netezza ODBC drivers. I wonder if the same error happens there. Unfortunately I have no access to those drivers myself. |
In this case it means I need to be able to reproduce it myself, but I cannot because I do not currently have access to a netezza db |
We had this same issue, wanted to share some insights on what we found. If one looks at the bytes, it is an endian problem (UTF-16LE vs UTF-16BE), i.e. the order of the bytes. We were able to duplicate the issue from R in Python. Python has an encode('utf-16le') function that can transpose the bytes and then it was readable data when printed. I'm not sure if it is the driver or odbc package that needs additional modification, but something needs to be able to take an option of UTF-16LE vs UTF-16BE, or something for endianness that would also work with UTF-32, etc. |
The ODBC Specification seems to be that all SQLWCHAR data should be little endian
So this seems like a driver issue to me, I am not sure if there is a Netezza driver configuration setting that can change this or not. You should be able to convert the data after retrieval by using |
Thanks for the update. I have a support ticket open with IBM. |
Can you try setting the locale as described in the following URL to see if that help? https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_specify_encoding.html |
No effect. I believe that relates to the nzsql tool and not the odbc connection. |
I had another look at this. I do not believe it is an endian problem. |
Issue Description and Expected Result
When quering nettezza on nvarchar fields, returns strange characters instead of plain UTF-8.
The issue occurs only on nvarchar.
varchar, date_time and numeric work correctly
Database
IBM Netezza
Reproducible Example
The text was updated successfully, but these errors were encountered: