Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message on parallel turtle parsing of scattered base and prefix declarations #1807

Merged
merged 8 commits into from
Feb 18, 2025

Conversation

RobinTF
Copy link
Collaborator

@RobinTF RobinTF commented Feb 14, 2025

The parallel Turtle parser requires that all PREFIX and BASE declarations stand in a single block at the beginning of the file.
Until now, prefix declarations after this initial block led to an exception about the prefix being undeclared, which was highly misleading. With this PR, this case triggers a proper exception about PREFIX declarations being illegal in the middle of a Turtle file with the parallel parser, together with some guidance to mitigate this issue by either refactoring the input, or deactivating the parallel parser.

Fixes #1794 by providing a better message for the described error.

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

Attention: Patch coverage is 94.87179% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.16%. Comparing base (1570033) to head (6e3e03a).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
src/parser/RdfParser.cpp 96.96% 0 Missing and 1 partial ⚠️
src/parser/Tokenizer.h 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1807      +/-   ##
==========================================
+ Coverage   90.02%   90.16%   +0.14%     
==========================================
  Files         396      399       +3     
  Lines       37974    38223     +249     
  Branches     4262     4281      +19     
==========================================
+ Hits        34185    34463     +278     
+ Misses       2493     2473      -20     
+ Partials     1296     1287       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this fixes the problem in all cases.
I'll have to look into the parallel data structures, can we guarantee that we always see the first exception even when parsing multiple batches at once, where the first has an illegal prefix declaration, and the second one uses the prefix (which is then not found).

Comment on lines 35 to 40
if (successfulParse && prefixAndBaseDisabled_) {
raise(
"@prefix or @base directives need to be at the beginning of the file "
"when using the parallel parser. Use '--parse-parallel false' if you "
"can't guarantee this.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also refer the user to the mode that parses multiple files.

@sparql-conformance
Copy link

@joka921 joka921 merged commit 8678731 into ad-freiburg:master Feb 18, 2025
22 of 23 checks passed
@RobinTF RobinTF deleted the better-parallel-error-message branch February 18, 2025 16:12
hannahbast pushed a commit that referenced this pull request Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Turtle parser gets prefix table confused.
2 participants