Skip to content

NLP model for extracting chinese datas from the documents

Notifications You must be signed in to change notification settings

jwest951227/extractorChinese

Repository files navigation

extract-chinese

Extract Chinese and English from 2 documents and matching them by same meaning sentences.

Getting Started

This project is a python project to extract two chinese and english sentences text from 2 PDFs. And to match the sentences by cosine score created embedding values.

pip install pdfplumber pip install nltk pip install jieba pip install sentence_transformers ...

Open python console

import nltk nltk.download('punkt')

and set some env values

About

NLP model for extracting chinese datas from the documents

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages