Skip to content

Commit

Permalink
Overriding the equality check defined in PosTokenImpl.
Browse files Browse the repository at this point in the history
This is needed because some of the uses of equals() for ERToken need to worry about more than just the token text. See, for example, mapToTokenList() in EntityLookup5 - it uses .indexOf() to find where a given token (including its offset, etc.) originally occurred in a differently-processed list of tokens. If we only look at tokenText, and the token in question occurred multiple times in the token list, then we will essentially lop off everything that happened after the first occurrence of the token.

This is a particularly big issue with punctuation, as it is quite common for periods etc. to occur multiple times in a given input.

If we've done phrase chunking, this is _less_ of a risk but definitely can still happen.
  • Loading branch information
stevenbedrick committed May 6, 2023
1 parent 9b06071 commit f89617c
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions src/main/java/gov/nih/nlm/nls/metamap/prefix/ERTokenImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,23 @@ public void setPartOfSpeech(String partOfSpeech) {
public String toString() {
return this.tokenText + "|" + this.tokenClass + "|" + this.offset + "|" + this.partOfSpeech;
}

/**
* Overriding the equality check defined in PosTokenImpl, since some of the uses of equals()
* for ERToken need to worry about more than just the token text. See, for example, mapToTokenList()
* in EntityLookup5 - it uses .indexOf() to find where a given token (including its offset, etc.)
* originally occurred in a differently-processed list of tokens. If we only look at tokenText,
* and the token in question occurred multiple times in the token list, then we will essentially
* lop off everything that happened after the first occurrence of the token.
*
* This is a particularly big issue with punctuation, as it is quite common for periods etc. to occur
* multiple times in a given input.
*
* If we've done phrase chunking, this is <i>less</i> of a risk but definitely can still happen.
*
* @param anotherToken token to compare with
* @return true if the two tokens share the same text, class, offset, and PoS
*/
public boolean equals(Object anotherToken)
{ return this.toString().equals(((ERTokenImpl)anotherToken).toString()); }
}

0 comments on commit f89617c

Please sign in to comment.