-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix URL parser for parentheses and Chinese Characters #43
base: master
Are you sure you want to change the base?
Conversation
May I suggest adding a test to reproduce that? |
@@ -1,6 +1,8 @@ | |||
import 'package:linkify/linkify.dart'; | |||
|
|||
void main() { | |||
print(linkify("Made by https://cretezy.com [email protected]")); | |||
print(linkify( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, mind reverting this and adding a test instead?
Sorry, I'm quite new to GitHub and it looks like I'm not quite aware of how PRs and tests work. The last 5 "experimental" commits were not meant for merging. I was trying to find the best Regex that meets the need of my forum (where users may post links). Although I've found the best Regex for my app, it may not be the best for general purposes and thus might not be suitable for merging. Since this is only a simple Regex change, I suggest closing this PR and manually "merge" this commit, if needed. As for tests, since I'm not familiar with how it works..., could you please provide instructions on how to add one? Sorry for troubling you... |
Hey, thanks for the contribution! I could do that, or if you wish I can walk you through bring up your PR to something we can merge together (and you learn!). Let me know! |
Thank you very much for your kind offer :) Where should we begin? |
After experimenting with the URL parser, I came up with several ideas: Unexpected Unicode CharactersCurrently, an URL element will end only when there is a whitespace character matched by According to RFC3986, encoded URLs may only contain the following characters:
which translates to
(not sure why but it looks like the syntax is a little different in Dart, why does This will prevent unexpected unicode characters from being matched into our LinkElement. ParenthesesParentheses appear as sub-delimiters in RFC3986, so they are technically allowed as a part of the URL. But in practical context they are often confused with the surrounding text (e.g. in Markdown, where links always ends with |
Currently, URLs like
http://foo.com)你好
will be interpreted as is. Instead, it should be separated intohttp://foo.com
and)你好
(according to URL conventions).Although parentheses are by definition allowed in URLs, they usually appear encoded. So parentheses that appear at the end of a URL are, for most of the time, not part of it.
Chinese characters are simply not allowed in URLs.
This PR fixes those issues.