You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the ion command line tool (and by extension this library), my understanding is that a file containing:
'''x''''
should be equivalent to
"x'"
Instead, it reads ~as:
"x"
Syntax error: unexpected "'" before end of file
I'm submitting this from my work laptop (feel free to look me up internally), but I can supply the actual error message from my personal machine when I get home if that would be helpful. I also have not tested this with the non-steam parser; this came up as I was attempting to a validate an ion tree-sitter grammar.
The x isn't required to surface the bug - ''''''' also fails to read as "'", but the x makes the example string easier to read.
I suspect this is a result of greedily treating three single quotes in a long string as termination, despite that not necessarily being the case.
As far as I can tell, there isn't any reason that this process should be greedy. Should a bunch of single quotes show up in a row at the end of a long string, parsing them while allowing for this case is unambiguous (1).
In terms of streaming performance, I also cannot see a reason why allowing for such strings would cause a change in behavior. Long strings can be split across multiple sections, so a full string can't be emitted from a stream reader when one long-string section is terminated.
E.g.: if a process was reading a stream of data via an ion stream reader:
---- Packet 1, T1 ----
'''\
Some long string
'''
---- Packet 2, T2 = T1 + 10 seconds ----
'''\
Line 2 of a long string
'''
---- Packet 3, T3 = T2 + 10 seconds ----
123
The long string starting in Packet 1 could not be processed fully processed until T3 anyways, due to the possibility of another long string section.
(1):
'''
x'// Parser state: a superposition of:
// - "in the middle of a long string section, last two characters in the long string are
// single quotes"
// String so far: "\nx'"
// ^ Non ' will collapse to this branch
// - "In the middle of a long string section start token"
// String so far: "\nx"
'''
x''// Parser state: a superposition of:
// - "in the middle of a long string section, last two characters in the long string are
// single quotes"
// String so far: "\nx''"
// ^ Non ' will collapse to this branch
// - "In the middle of a long string section start token"
// String so far: "\nx"
'''
x'''// Parser state: Long string containing "\nx"
'''
x''''// Parser state: a superposition of:
// - "A long string section, the contents of which end with a single quote"
// String so far: "\nx'"
// ^ Non ' will collapse to this branch
// - "One ' read in a ''' long string section start token"
// String so far: "\nx"
'''
x'''''// Parser state: a superposition of:
// - "A long string section, the contents of which end with a single quote"
// String so far: "\nx''"
// ^ Non ' will collapse to this branch
// - "Two ' read in a ''' long string section start token"
// String so far: "\nx"
'''
x''''''// Parser state: Just started a new long string section. String so far: "\nx"
'''
x'''''''// Parser state: a superposition of:
// - Just started a new long string section.
// String so far: "\nx'"
// ^ Non ' will collapse into this branch
// - One ' into a ''' long string end token.
// String so far: "\nx"
'''
x''''''''// Parser state: a superposition of:
// - Just started a new long string section.
// String so far: "\nx''"
// ^ Non ' will collapse into this branch
// - Two ' into a ''' long string end token.
// String so far: "\nx"
'''
x'''''''''// Parser state: Just started a new long string section. String so far: "\nx"
At which point we loop back to the first state should we see another '
The text was updated successfully, but these errors were encountered:
I believe this behavior is actually correct, and all Ion parsing implementations that I've tried with this input (ion-c, ion-rust, ion-java) behave the same way.
Note that this does not prevent you from encoding the text x' using a long string. You simply need to escape the ' character that is intended to be part of the data, e.g. '''x\''''.
A similar thing happens when using short strings, e.g. """ produces an error, but "\"" produces the text ".
If Ion did not allow for quoted symbols, I think there would be a stronger case for not tokenizing ‘’’ greedily. As it stands, though, that greatly complicates the “seven single quotes in a row” handling in the cases where a symbol is valid following a string, and the permissive interpretation would probably come with some nasty surprises.
Sorry for the spurious report! Glad to see I was wrong on this one.
Would you accept a contribution of seven single quotes in a row as bad test data? I couldn't find it there, and it would be nice to make this policy explicit. On the other hand, it would be a bit of an odd case since the "bad" test data would start with a valid string.
When using the
ion
command line tool (and by extension this library), my understanding is that a file containing:should be equivalent to
Instead, it reads ~as:
I'm submitting this from my work laptop (feel free to look me up internally), but I can supply the actual error message from my personal machine when I get home if that would be helpful. I also have not tested this with the non-steam parser; this came up as I was attempting to a validate an ion tree-sitter grammar.
The
x
isn't required to surface the bug -'''''''
also fails to read as"'"
, but thex
makes the example string easier to read.I suspect this is a result of greedily treating three single quotes in a long string as termination, despite that not necessarily being the case.
As far as I can tell, there isn't any reason that this process should be greedy. Should a bunch of single quotes show up in a row at the end of a long string, parsing them while allowing for this case is unambiguous (1).
In terms of streaming performance, I also cannot see a reason why allowing for such strings would cause a change in behavior. Long strings can be split across multiple sections, so a full string can't be emitted from a stream reader when one long-string section is terminated.
E.g.: if a process was reading a stream of data via an ion stream reader:
The long string starting in Packet 1 could not be processed fully processed until T3 anyways, due to the possibility of another long string section.
(1):
At which point we loop back to the first state should we see another
'
The text was updated successfully, but these errors were encountered: