Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete URL Extraction with Trailing Punctuation #1640

Open
FutureBuddha opened this issue Feb 22, 2025 · 2 comments
Open

Incomplete URL Extraction with Trailing Punctuation #1640

FutureBuddha opened this issue Feb 22, 2025 · 2 comments

Comments

@FutureBuddha
Copy link

FutureBuddha commented Feb 22, 2025

I use Lychee with the --dump option to collect all links from my website. The workflow involves generating a list of unique URLs and subsequently testing each link.

However, I recently encountered an issue: a URL that ends with a trailing period is not captured correctly. For example, on my website I have the following link:

https://www.ebl-naturkost.de/maerkte/markt-nuernberg-harsdoerfferstr.

This link is embedded as:

<a href="https://www.ebl-naturkost.de/maerkte/markt-nuernberg-harsdoerfferstr."/>

When I run lychee --dump, the output only includes:

https://www.ebl-naturkost.de/maerkte/markt-nuernberg-harsdoerfferstr

The missing trailing period results in an incomplete URL, leading to a broken page when the link is tested.

It would be ideal if the link extraction logic could be adjusted to capture the complete URL—including any trailing punctuation.

@mre
Copy link
Member

mre commented Feb 22, 2025

That's strange; it works for me.

I've tested with both, a local web server and a local file. In both cases, the URL gets correctly extracted.
See #1641.

Is your setup special somehow? E.g. are you parsing actual HTML files, or maybe you use a different file ending like .md (i.e. you're trying to dump Markdown files) or no file ending at all?

@mre
Copy link
Member

mre commented Feb 24, 2025

Merged in the tests. Would it be possible to write down some instructions on how to reproduce your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants