Incomplete URL Extraction with Trailing Punctuation #1640

FutureBuddha · 2025-02-22T14:54:13Z

I use Lychee with the --dump option to collect all links from my website. The workflow involves generating a list of unique URLs and subsequently testing each link.

However, I recently encountered an issue: a URL that ends with a trailing period is not captured correctly. For example, on my website I have the following link:

https://www.ebl-naturkost.de/maerkte/markt-nuernberg-harsdoerfferstr.

This link is embedded as:

<a href="https://www.ebl-naturkost.de/maerkte/markt-nuernberg-harsdoerfferstr."/>

When I run lychee --dump, the output only includes:

https://www.ebl-naturkost.de/maerkte/markt-nuernberg-harsdoerfferstr

The missing trailing period results in an incomplete URL, leading to a broken page when the link is tested.

It would be ideal if the link extraction logic could be adjusted to capture the complete URL—including any trailing punctuation.

The text was updated successfully, but these errors were encountered:

mre · 2025-02-22T17:51:15Z

That's strange; it works for me.

I've tested with both, a local web server and a local file. In both cases, the URL gets correctly extracted.
See #1641.

Is your setup special somehow? E.g. are you parsing actual HTML files, or maybe you use a different file ending like .md (i.e. you're trying to dump Markdown files) or no file ending at all?

mre · 2025-02-24T07:50:09Z

Merged in the tests. Would it be possible to write down some instructions on how to reproduce your issue?

mre mentioned this issue Feb 22, 2025

Add tests for URL extraction ending with a period #1641

Merged

mre added the waiting-for-feedback label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete URL Extraction with Trailing Punctuation #1640

Incomplete URL Extraction with Trailing Punctuation #1640

FutureBuddha commented Feb 22, 2025 •

edited

Loading

mre commented Feb 22, 2025

mre commented Feb 24, 2025

Incomplete URL Extraction with Trailing Punctuation #1640

Incomplete URL Extraction with Trailing Punctuation #1640

Comments

FutureBuddha commented Feb 22, 2025 • edited Loading

mre commented Feb 22, 2025

mre commented Feb 24, 2025

FutureBuddha commented Feb 22, 2025 •

edited

Loading