Skip to content

NLP model for extracting chinese datas from the documents

Notifications You must be signed in to change notification settings

jwest951227/extractorChinese

Repository files navigation

extract-chinese

Extract Chinese and English from 2 documents and matching them by same meaning sentences.

Getting Started

This project is a python project to extract two chinese and english sentences text from 2 PDFs. And to match the sentences by cosine score created embedding values.

pip install pdfplumber pip install nltk pip install jieba pip install sentence_transformers ...

Open python console

import nltk nltk.download('punkt')

and set some env values

Releases

No releases published

Packages

No packages published

Languages