🪲 Keyword translation fails for Catalan: "at random" -> "att random" #6118

rix0rrr · 2025-01-21T21:14:26Z

Describe the bug

If we translate the keywords in a program from Catalan to English, but the English keywords at random are used, the translation output is att random.

I think this is because the Catalan translation of this keyword (a) is a prefix of the English translation of this keyword (at).

The following tests reproduces it (put it in test_translation_level_06.py):

    def test_specific_catalan_example(self):
        code = textwrap.dedent("""\
            ruleta1 = ruleta at random
            """)

        result = hedy_translation.translate_keywords(code, from_lang="ca", to_lang="en", level=self.level)

        self.assertEqual(result, 'ruleta1 = ruleta at random')

When I add a print statement to the the innards of translate_keywords:

                replaced_line = replace_token_in_line(line, rule, original, target)
                print(line, '|', replaced_line, '# replacing', rule, original, target)    # <-- debugging

I see the following:

ruleta1 = ruleta at random | ruleta1 = ruleta at random # replacing Rule(keyword='random', line=1, start=20, end=25, value='random') random random
ruleta1 = ruleta at random | ruleta1 = ruleta att random # replacing Rule(keyword='at', line=1, start=16, end=19, value=' at ') a at

For the first line: we find the keyword random and replace it with random.
For the second line, we see that we think we matched the keyword a and are replacing it with at, whereas we actually should have matched the keyword a.

This is not a pure language confusion bug, as the Catalan keyword for random is aleatori, so we definitely can distinguish whether we matched an English vs a Catalan keyword. This makes me the think the problem is that a is a prefix of at. Maybe it's a regex thing somewhere?

The text was updated successfully, but these errors were encountered:

rix0rrr · 2025-01-21T21:15:34Z

Oh no, the problem is probably here:

def get_original_keyword(keyword_dict, keyword, line):
    for word in keyword_dict[keyword]:
        if word in line:
            return word

    # If we can't find the keyword, it means that it isn't part of the valid keywords for this language
    # so return original instead
    return keyword

It's not that a is a prefix of at, but it's a proper substring.

rix0rrr · 2025-01-21T21:26:46Z

Keyword replacing on a string basis after we've already parsed is error prone.

Solution: replace based on parse tree, but do substitutions in reverse order so that later string indexes don't shift after a substitution.

In #6118, we discovered that when the source language keywword is a substring of the target language keyword, we make a mistake in the string replacement. This happens because we do the following: - First: parse the entire program - Then: for every line containing a keyword, do a separate search and replace on the line. In this PR, I'm changing the logic a little: as part of parsing, we already know the location in the line where the keyword occurs, so we can just immediately replace that part of the line with the new keyword. Two complications that make this slightly less straightforward than it sounds: - Our grammar rules match whitespace as part of the keyword token, but the whitespace needs to remain in place. - Solution: find the whitespace in the matched token and only substitute the rest of the token. - String substitutions may change the length of the string, and therefore invalidate all the indexes of the parse tree that follow it. - Solution: first collect a list of all subsitutions, then apply them in back-to-front order. That way, all yet-to-be-processed indexes are never disturbed by a substitution. Fixes #6118.

rix0rrr added the bug Something isn't working label Jan 21, 2025

github-project-automation bot added this to Hedy organization board Jan 21, 2025

rix0rrr self-assigned this Jan 21, 2025

AnneliesVlaar mentioned this issue Jan 22, 2025

🪲 Weird inconsistency in teacher page program overview #6043

Open

rix0rrr mentioned this issue Jan 22, 2025

🪲 Fix keyword translation when keywords are substrings of each other #6121

Merged

mergify bot closed this as completed in #6121 Jan 22, 2025

mergify bot closed this as completed in f8003e0 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🪲 Keyword translation fails for Catalan: "at random" -> "att random" #6118

🪲 Keyword translation fails for Catalan: "at random" -> "att random" #6118

rix0rrr commented Jan 21, 2025

rix0rrr commented Jan 21, 2025

rix0rrr commented Jan 21, 2025

🪲 Keyword translation fails for Catalan: "at random" -> "att random" #6118

🪲 Keyword translation fails for Catalan: "at random" -> "att random" #6118

Comments

rix0rrr commented Jan 21, 2025

rix0rrr commented Jan 21, 2025

rix0rrr commented Jan 21, 2025