Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better sorting #50

Merged
merged 9 commits into from
Jan 3, 2022
Merged

Better sorting #50

merged 9 commits into from
Jan 3, 2022

Conversation

JohnGiorgi
Copy link
Owner

@JohnGiorgi JohnGiorgi commented Jan 3, 2022

Overview

This PR is focused on improving the sorting of relations. We still order relations according to their first appearance in the text, with some subtle changes:

  • An entity's offset is now determined by the sum of the start and end characters of its first mention. Previously, we just took the end character offset. This is more informative for overlapping/nested entities. Ditto for entity hints.
  • Relations are now sorted by first considering the head entities' order. Once sorted by the head entity, they are sorted by the tail entity, and so on for n-ary relations.

The idea is that this ordering might be easier for a model to learn and therefore improve performance. In reality, it is a bit of a mixed bag but these changes mostly improve (or don't harm) performance.

@JohnGiorgi JohnGiorgi merged commit 4a2c3d7 into main Jan 3, 2022
@JohnGiorgi JohnGiorgi deleted the better-sorting branch January 3, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant