You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently pragmatic_segmenter returns an instance of PragmaticSegmenter::Text, which is a subclass of String. As pragmatic_tokenizer checks if text.class == String and also returning segmented objects of a different class than initially passed, I suggest to return strings instead of instances of the only internally used subclass.
I wonder if there is a smarter idea than using #to_s when returning the result, as it would unnecessarily duplicate the strings in memory. Maybe instead of using a subclass of String rather extend the String class with a module providing that single method used? (and use a method name which won't have a chance to mess with anyones code surprisingly if they also decide to extend the String class)
The text was updated successfully, but these errors were encountered:
Currently pragmatic_segmenter returns an instance of
PragmaticSegmenter::Text
, which is a subclass ofString
. As pragmatic_tokenizer checks iftext.class == String
and also returning segmented objects of a different class than initially passed, I suggest to return strings instead of instances of the only internally used subclass.I wonder if there is a smarter idea than using
#to_s
when returning the result, as it would unnecessarily duplicate the strings in memory. Maybe instead of using a subclass of String rather extend the String class with a module providing that single method used? (and use a method name which won't have a chance to mess with anyones code surprisingly if they also decide to extend the String class)The text was updated successfully, but these errors were encountered: