Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with emoji #43

Open
senhan07 opened this issue Sep 5, 2023 · 3 comments
Open

Issue with emoji #43

senhan07 opened this issue Sep 5, 2023 · 3 comments
Assignees

Comments

@senhan07
Copy link

senhan07 commented Sep 5, 2023

Its not working when there is a emoji inside markdown files, its should be able to skip/ignore that character

PS C:\Users\Lenovo\Desktop\stable-diffusion-wiki> md-translate C:\Users\Lenovo\Desktop\stable-diffusion-wiki\language\en\berkontribusi.md -F id -T en -P deepl -D  -v
ERROR:md_translate.application:Error processing file: C:\Users\Lenovo\Desktop\stable-diffusion-wiki\language\en\berkontribusi.md
ERROR:md_translate.application:'charmap' codec can't decode byte 0x9d in position 61: character maps to <undefined>
Traceback (most recent call last):
  File "C:\Users\Lenovo\.local\pipx\venvs\md-translate\lib\site-packages\md_translate\application.py", line 106, in process_file
    document = MarkdownDocument.from_file(
  File "C:\Users\Lenovo\.local\pipx\venvs\md-translate\lib\site-packages\md_translate\document\document.py", line 102, in from_file
    file_content = target_file.read_text()
  File "C:\Program Files\Python310\lib\pathlib.py", line 1135, in read_text
    return f.read()
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 61: character maps to <undefined>
@ilyachch
Copy link
Owner

ilyachch commented Sep 5, 2023

please, attach the piece of the translating document, which is causing problem

@senhan07
Copy link
Author

senhan07 commented Sep 5, 2023

this is the file
berkontribusi.md

@ilyachch
Copy link
Owner

ilyachch commented Sep 5, 2023

Thank you. I'll try to solve it.
The obvious solution is to use emoji lib, which has func demojize, that is converting 🤝 to :handshake:, but there appears new problem - if target language is not English, translating services can translate also :handshake: to :handschlag: (in case of German lang, for example). So, I have to think, how to make it work robustly.

Anyway, for now, I highly recommend you to remove emojis, translate document, and put emojis back.

@ilyachch ilyachch self-assigned this Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants