Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[String_] Add rawValue attribute #831

Merged
merged 6 commits into from
May 15, 2022
Merged

Conversation

TomasVotruba
Copy link
Contributor

In Rector we have to hack around tokens to get the original value, because the parse() method flattens original information:

public static function parse(string $str, bool $parseUnicodeEscape = true) : string {
$bLength = 0;
if ('b' === $str[0] || 'B' === $str[0]) {
$bLength = 1;
}
if ('\'' === $str[$bLength]) {
return str_replace(
['\\\\', '\\\''],
['\\', '\''],
substr($str, $bLength + 1, -1)
);
} else {
return self::parseEscapeSequences(
substr($str, $bLength + 1, -1), '"', $parseUnicodeEscape
);
}
}

We could use also attribute value, but it would require more grammar changes.

Closes #577

Also fixes #576

@dryabov
Copy link

dryabov commented May 9, 2022

Isn't getStartFilePos/getEndFilePos sufficient to get raw data? Note it requires to manually create the lexer and pass it to the parser, e.g.:

use PhpParser\Lexer\Emulative;
use PhpParser\ParserFactory;

$options = ['usedAttributes' => ['comments', 'startLine', 'endLine', 'startFilePos', 'endFilePos']];
$lexer = new Emulative($options);
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7, $lexer);

$code = '<?php
Foo::bar("This is a string with escape sequence \x41");
';
$ast = $parser->parse($code);
$node = $ast[0]->expr->args[0]->value;

$startFilePos = $node->getStartFilePos();
$endFilePos = $node->getEndFilePos();
var_dump(substr($code, $startFilePos, $endFilePos - $startFilePos + 1));

@TomasVotruba
Copy link
Contributor Author

Avoiding work with tokens to get original value is exactly point. During Rector upgrade it might be crutial to know the original value, not to replace something deprecated.

@dryabov
Copy link

dryabov commented May 10, 2022

But get(Start|End)FilePos is a standard way to get access to raw data, e.g. well-known psalm works in this way (see Psalm\Internal\Provider\StatementsProvider::parseStatements for parser initialization there, and search its source code for getStartFilePos() and getAttribute('startFilePos') to get usage examples).

@TomasVotruba
Copy link
Contributor Author

TomasVotruba commented May 10, 2022

That seems like a work around. The change of value is the issue.

@TomasVotruba TomasVotruba changed the title [String_] Add raw value as 3rd optional arg [String_] Add rawValue attribute May 12, 2022
@TomasVotruba
Copy link
Contributor Author

Added rawValue attribute ✔️

grammar/php5.y Outdated Show resolved Hide resolved
@TomasVotruba
Copy link
Contributor Author

Updated and rebased 👍 Ready for review ✔️

grammar/php5.y Outdated Show resolved Hide resolved
grammar/php5.y Outdated Show resolved Hide resolved
@nikic nikic merged commit 5d83adc into nikic:master May 15, 2022
@TomasVotruba TomasVotruba deleted the tv-raw-value branch May 15, 2022 21:14
@TomasVotruba
Copy link
Contributor Author

Thanks 👍

@nikic
Copy link
Owner

nikic commented May 16, 2022

Probably this should be added to EncapsedStringPart as well?

@TomasVotruba
Copy link
Contributor Author

TomasVotruba commented May 16, 2022

What exactly do you mean? Have fromString() method there? (I never used this node much, so I have no idea about it.)

@nikic
Copy link
Owner

nikic commented May 16, 2022

I mean having a rawValue attribute. It seems inconsistent to only add it to non-interpolated strings.

@TomasVotruba
Copy link
Contributor Author

I'm on it 😉

@TomasVotruba
Copy link
Contributor Author

See #837

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get unparsed string from String_ for escaped sequences
3 participants