Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String interpolation proposal #9

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

estebanlm
Copy link
Member

A proposal to include String interpolation as part of the Pharo language.

@jordanmontt
Copy link
Member

+1 for the string interpolation in Pharo

@yannij
Copy link

yannij commented May 19, 2022

+1 A lot of code I write (for web and logging) would be much easier write, read, and change. Dealing with quotes and commas (concat) or stream, is tedious. I currently use #format: and have to look it up all the time because I keep remembering #bindSomething from another dialect. If the syntax could be adopted by other Smalltalk's then that would be even better.

@pavel-krivanek
Copy link

Maybe it would make sense to introduce a pragma activating that pluggin only to a given method.

@noha
Copy link
Member

noha commented May 19, 2022

This proposal only describes the problem but not how the solution looks like

@estebanlm
Copy link
Member Author

estebanlm commented May 19, 2022 via email

@estebanlm
Copy link
Member Author

Maybe it would make sense to introduce a pragma activating that pluggin only to a given method.

I disagree with this. IMO it would beat the purpose of the proposal.
Moreover, this is not a problem because in absence of the plugin, the string will remain usable (it will just not interpolate the contents).

@estebanlm
Copy link
Member Author

This proposal only describes the problem but not how the solution looks like

linked prototype does not fits that?

@noha
Copy link
Member

noha commented May 19, 2022

No, because it is linked and can be changed. So if we want to have defined things it needs to be written here. Maybe I'm too fast and it will change

@yannij
Copy link

yannij commented May 19, 2022

The #bindWith:with: ... is probably VW Smalltalk, which I don't have access to anymore. IIRC, the syntax/API is a mixture of the #expandMacroWith:with: and #format: in Pharo - which is why I have to look it up almost every time.

I'll look for a link to the prototype you mentioned. I followed a few links and ended up at the PhEP docs, and I wasn't sure if there was code or just a text proposal.

Hi, If you point me to those smalltalk implementations I will take a look. For now we are just taking what I think is the better solution, which is to use the "format" notation, but with an interpolation inside (still, this needs to be explored since for now it makes incompatible this solution and the usage of #format: method, and this is not what we want). Take a look at the prototype implementation linked in the phep and tell me what do you think. cheers! Esteban

@yannij
Copy link

yannij commented May 19, 2022

Aha. I had to navigate github to find the .md file, not the diff.

The prototype seems like an automated rewrite to achieve what we can already do with #format:. That's okay. However, having it applied to every String makes me a bit worried. I don't see how a pragma would be a problem. Maybe a String could be sent the #interpolate message instead. It just seems too much control is lost (but I guess that why it's proposed as a language extension)

@estebanlm
Copy link
Member Author

The prototype seems like an automated rewrite to achieve what we can already do with #format:. That's okay. However, having it applied to every String makes me a bit worried. I don't see how a pragma would be a problem. Maybe a String could be sent the #interpolate message instead. It just seems too much control is lost (but I guess that why it's proposed as a language extension)

It is a language extension. For a method (#interpolate), Notice that for it to be efficient, it would need to do even more magic than the one my proposal addresses (which is just adding a compiler plugin) : you would need to store a literal with the "compiled" object on first execution, etc., etc.

The other concern "applied to every string": this is solved in compile time, if the compiler finds an "interpolable" string, it apply it and replaces that node with the interpolated bytecodes (one single time). All "non-interpolable" strings (which are most of them) are still same as before.

@yannij
Copy link

yannij commented May 19, 2022

It doesn't feel like an extension. It's more like a change to what a String is. If you unknowingly type something valid to interpolate then something else happens (and it feels a bit magical). The curly brace, "{}", extension for Array construction is an obvious syntax extension. This String "extension" proposal is not obvious. Perhaps a different kind of marker could work - maybe something like:
^ $'This is an interpolated String: Five is {five}'

@theseion
Copy link

theseion commented May 19, 2022

I've had this snippet lying around for a couple of years and never got around to do anything about it. If an implementation comes from this issue then maybe this can help. I took inspiration from Python btw :).

pyFormat: aCollection
	| reader position |
	reader := self readStream.
	position := 1.
	^ self species new: self size streamContents: [ :stream |
		| index |
		[ reader atEnd ] whileFalse: [
			stream nextPutAll: (reader upTo: ${).
			(reader atEnd not and: [ reader peek = ${])
				ifTrue: [
					"escaped {}"
					stream nextPut: reader next ]
				ifFalse: [
					"format"
					index := reader upTo: $}.
					stream nextPutAll: (index isEmpty
						ifTrue: [
							aCollection size < position
								ifFalse: [
									position := position + 1.
									(aCollection at: position - 1) asString ]
								ifTrue: [ '' ] ]
						ifFalse: [
							(index isAllDigits and: [ aCollection size >= (index := index asInteger) ])
								ifTrue: [ (aCollection at: index) asString ]
								ifFalse: [ '' ] ]) ] ] ]

'foo {} bar {{} {{{3}{}{}' pyFormat: #(1 2 3)

@estebanlm
Copy link
Member Author

estebanlm commented May 19, 2022

It doesn't feel like an extension. It's more like a change to what a String is. If you unknowingly type something valid to interpolate then something else happens (and it feels a bit magical). The curly brace, "{}", extension for Array construction is an obvious syntax extension. This String "extension" proposal is not obvious. Perhaps a different kind of marker could work - maybe something like:

No, string is still a string. What we are changing is the method, when we compile it.
I just added this example to the proposal, but it works here:

in the compiled method, this code:

greet := 'Hello {name}, what can I do for you?'.

will become something like:

greet := StringInterpolator 
	interpolate: 'Hello {name}, what can I do for you?'
	withAssociations: { #name -> name }.

EDIT: TBH, it can even be replaced by a call to format:, heh

@pavel-krivanek
Copy link

One other possibility is to have a "gerenal literal". Something in form MyClass(some content). The compiler would then take the string inside brackets (+some compilation context) and ask the class to build a literal object from it. I would prefer brackets because they occur in pairs in 99% of cases.

It may not be optimal solution for this particular case (we can use something short like F(Hello {name}, what can I do for you?) and I'm not sure if it would work well for variables processing easily but it would be so general that it may be used for many other cases like building STON literals, JSON literals, SQL queries,

@estebanlm
Copy link
Member Author

One other possibility is to have a "gerenal literal". Something in form MyClass(some content). The compiler would then take the string inside brackets (+some compilation context) and ask the class to build a literal object from it. I would prefer brackets because they occur in pairs in 99% of cases.

lol, this is another phep I am thinking on... for when I have time to work on it, so most probably not this iteration, heh.
Unless you want to present it... is a lot of work, I advice you!

@mjr104
Copy link

mjr104 commented May 20, 2022

I think adding this feature is a good idea. I would like to be able to write less verbose Smalltalk that assembles Strings with tokenised parts.

I think it's worth looking at how the python language has evolved string interpolation. They too have accumulated a number of different ways of achieving it, similar to how it is described in this proposal. I find their latest incarnation f-strings very nice to use. I think this is because the code is more compact and readable. In my own Smalltalk programming I often come across the bindWith:With:With: and expandMacros: variants and I tend to use the later. However the code is very verbose and often needs formatting across multiple lines. I also don't like the choice given to programmers in my system. You can pick any of them and the inconsistency is jarring.

Implementing something like f-strings would I'm sure would be a lot of work but the progression is useful to study. It's interesting python has a specific syntax f' ' or F' ', so that you have to opt in. This of course adds more syntax to the language and you would have to take a view if the extra knowledge required is worth the benefit in the Smalltalk context. I would say it is, having some experience performing String interpolation in both languages. Of course such code will become non-portable but then a compiler extension is going to be non-portable anyway. Portability may not be a high concern but it could be possible to automatically rewrite such code to a format: style equivalent.

If I understand this current proposal any string in the system that contained the interpolation marker {}, when re-compiled, would change its behaviour. This could generate subtle problems with existing custom use of {} in strings. Reasoning about such changes can also be hard if interpolating strings are stored in databases or files.

@gcotelli
Copy link
Member

gcotelli commented May 20, 2022

Hi, I found this very useful.

I really would like to see in the final proposal:

  • A formal definition of the Interpolation syntax
  • What kind of expressions can be used inside the interpolated section? Any valid expression? All the examples are using temp variables
  • Includes escaping some sequences? like cr, tab, lf, OS line platform delimiter, Unicode scalars, or we will need to send another message to expand this ones
  • How to escape the interpolated section delimiter ({ or whatever it's used)
  • How are the interpolated values printed? Will it always send printString to the resulting object of evaluating the expression? Can a formatter be configured somehow and used instead of printString? What happens if the interpolated value is already a string it will be double-quoted, or just concatenated?
  • Are the interpolation/escaping mechanics extendable by the user?

For example, Swift uses the \ for scaping sequences and interpolation. So \r is replaced by cr, \(expression) is interpolated evaluating the expression, \u{301} is replaced by the Unicode scalar with value 301:

String literals can include the following special characters:

The escaped special characters \0 (null character), \\ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quotation mark) and \' (single quotation mark)
An arbitrary Unicode scalar value, written as \u{n}, where n is a 1–8 digit hexadecimal number (Unicode is discussed in Unicode below)

In Swift the interpolation support allows constants, variables, literals, and any expression:

Each item that you insert into the string literal is wrapped in a pair of parentheses, prefixed by a backslash (\):

let multiplier = 3
let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
// message is "3 times 2.5 is 7.5"

@dionisiydk
Copy link

In addition to the compiler changes the proper tooling support is required. I see the code highlighting is implemented which is cool. But other features should be supported we well:

  • Variable rename refactoring is probably the main one.
    • Source string needs to be modified during refactoring
    • In place rename with a shortcut when variable is selected inside the string
  • Variable references.
    • They should be found inside interpolated strings.
    • Navigation menu (cmd+t) for the variables selected inside the string
  • How the debugger will work? With no expression support (just var names) it looks simple. No need to step over the expression inside the string. But StepInto can be quite confusing even with simple vars. You would be incidentally dived into the interpolation code.
  • what else?

@dionisiydk
Copy link

I played with PR. The arbitrary expressions are really working. But the debugger knows nothing about it. I wonder how difficult it would be to implement the proper stepping inside such string. In current form the debugger simply highlights the entire string and nothing changed during steps. It feels like StepOver does nothing when you click it.
Also the highlighting of current node is broken for other regular parts of the method. Try to step into the method where interpolated string is used.
Some work will be needed on the debugger side to correctly support such hidden code rewrites which the plugin performs under the method ast.

And back to the code navigation. With arbitrary expressions we should find senders, class references and etc.. inside. It works now as expression is really compiled inside the method. But the browser does not highlight the actual references inside the string.

@noha
Copy link
Member

noha commented May 21, 2022

I just approved for the changes of Esteban's addition of textual description what the PheP should actually achieve

phep-proposal.md Outdated
```

# Abstract
A proposal for incluide [String interpolation](https://en.wikipedia.org/wiki/String_interpolation) as part of the Pharo language.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo: "include", not "incluide"

@dionisiydk
Copy link

Back to the proposal.
I think for the consistency to the regular Smalltalk syntax a square brackets [] fit much better than a curly brackets {}.
A curly brackets defines a dynamic array but an expression inside a string is executed just like a block. So this will be a point of confusion for the people learning the language.

@dionisiydk
Copy link

Back to the proposal. I think for the consistency to the regular Smalltalk syntax a square brackets [] fit much better than a curly brackets {}. A curly brackets defines a dynamic array but an expression inside a string is executed just like a block. So this will be a point of confusion for the people learning the language.

The more I think about it the more I like the block syntax idea. It kind of hides the new complexity added to the language.
Now we will just say that the block inside a string is always evaluated. Nothing else. No need to introduce new terminology of the string interpolation (it is an implementation detail)

phep-proposal.md Outdated

NOTE: This is how the prototype is working now, we still need to solve some minor issues.


### Backwards compatibility
There may be cases where the string interpolation mechanism is incompatible with already existing packages. To allow this packages to be loaded we will add the capability to disable string interpolation as per package or class basis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this packages => these packages
capability => ability

@tesonep tesonep self-requested a review January 24, 2023 10:37
@privat
Copy link
Contributor

privat commented Jan 30, 2023

String interpolation is not a luxury. So +1 for the principle.

However, the proposal lacks information about the syntax

  • how to escape brackets?
  • what is the semantic of (various) bad formatting. eg is 'hello [name' a syntax error or the literal array as is?
  • how to deal with the numerous existing literals with brackets inside? Language extensions that are not backward compatible are often a lot of unforeseen trouble.

Unrelated, but since the plan is to update string literals, I also would like to have an extended syntax including escape sequences for control character or Unicode characters. eg \n or \u26EF in other languages.

@privat
Copy link
Contributor

privat commented Feb 1, 2023

I acknowledge the new addition of escaping, and I'm OK with it. But now there are clearly 3 special characters [, \ and ]. Note that the status of the last one alone is not that clear, but since it is also escaped in the example 'Hello, this is an \[escaped\] string' this suggests that the unescaped version of ] is meaningful (or at least cause an error/warning).

This open new issues that need to be addressed in the proposal:

  • how to escape \ (classic \\?)
  • what is the semantic of \ followed by non-special character (e.g. '\y')?
    1. Keep both characters (\y). Like in double-quoted shell string (or Python)?
      • echo "\y"
      • python3 -c 'print("\y")'
    2. eat the \ and keep the character. (y) Like in unquoted shell (or Ruby)?
      • echo \y
      • ruby -e 'puts("\y")'
    3. a previous one plus a warning?
    4. an error?
    5. other?
  • how to deal with existing literals, e.g. there should be a lot of "nice" cases with these special characters in clients of regular expressions, for instance.

Do not get me wrong, I like the proposal, but the history of programming languages shows that dealing with string literals in a consistent & backward-compatible & future-proof ways is not easy.

@PharoProject
Copy link
Member

PharoProject commented Feb 1, 2023 via email

@PharoProject
Copy link
Member

PharoProject commented Feb 1, 2023 via email

@privat
Copy link
Contributor

privat commented Feb 1, 2023

all the other cases are not part of this phep

I disagree here. The proposal currently suggests that 'Hello, this is an \[escaped] string' will be evaluated to the string Hello, this is an [escaped] string thus removing the ability to have Hello, this is an \[escaped] string.

@Ducasse
Copy link
Member

Ducasse commented Feb 2, 2023

Hi guys

I do not understand why the comments of Jean are not heard. This is super important to have a way to escape a syntax.
We have it for format: and I can tell you that without it I could not have migrated most of the refactoring tests because the new class format uses the exact same character that the variable of format and it took me a while to figure it out.

Now I do not get why addressing the points of Jean would be in another phep.
To me a phep is a place to discuss and produce something good.
And I really appreciate the attitude of Jean. I would like to have a bunch of tests validating his points.

@guillep
Copy link
Member

guillep commented Feb 3, 2023

all the other cases are not part of this phep (and we are not intending yet to define a complete escape $\ semantic)... in that case, feel free to send a new phep for it ;)

I don't think this should be part of another phep, this should be clearly defined in this phep, because otherwise many implementations with different semantics are possible, and adoption will be a mess...

@estebanlm
Copy link
Member Author

estebanlm commented Feb 3, 2023

we do address @privat concerns, he asks how to escape it, and the phep (after his input states) :

'this is an [unescaped] string'
'this is an \[escaped] string' "note this is the same escape syntax as with #format:"  

what the phep is missing and I will add is:

'this is again an \\[unescaped] string'

now, what is being asked is a general escaping mechanism for string, e.g. after this phep we decided that $\ will mark an escape, and we can have escapes alla C \n \t \r (and we need to always escape $\ so it is taken into account: \\), other then the interpolation string escape which is a particular case.
Such a general escaping mechanism is complex per se and an important change on how we handle all strings (while the interpolation string is just a particular case), and that's why I think it has to be a separated phep : we can have escaped strings without having interpolated strings.
Now, since the opposite is not right (we cannot have interpolated strings without escaped strings), then this new PHEP (escaping strings) can take precedence over this one (meaning it needs to be implemented before), but a general escaping mechanism cannot be defined in the context of a string interpolation phep.

So, @Ducasse... do not say we are not addressing, we are addressing it, and this is the explanation on why it has to be a separated PHEP.

@noha
Copy link
Member

noha commented Dec 20, 2024

Do we have a grace period after a proposed PHEP gets declined and closed? Until now PHEP turns out to be a vehicle to document features we do not implement. So it is kind of a would-have-been-nice-to-have-that-archive

@PharoProject
Copy link
Member

PharoProject commented Dec 20, 2024 via email

@estebanlm
Copy link
Member Author

estebanlm commented Dec 20, 2024

I still have the intention on work on this but I have not find the time, that's the status ;)

EDIT: ... and since I am the one that is supposed to implement it... is waiting here (even if it is accepted) until I have time to do it. Real work is adapt all the tools.

@noha
Copy link
Member

noha commented Dec 20, 2024 via email

@estebanlm
Copy link
Member Author

We could accept it assigning a number, but then... I still do not have time to work on it (but I swear I will, eventually ;) )

Unless someone else wants to take the implementation burden, which would be nice :P

@estebanlm
Copy link
Member Author

well, is in my TODO list for next year, if you want more details... so not sooooo far in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.