extract-chinese

Extract Chinese and English from 2 documents and matching them by same meaning sentences.

Getting Started

This project is a python project to extract two chinese and english sentences text from 2 PDFs. And to match the sentences by cosine score created embedding values.

pip install pdfplumber pip install nltk pip install jieba pip install sentence_transformers ...

Open python console

import nltk nltk.download('punkt')

and set some env values

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
.gitignore		.gitignore
Batch2.py		Batch2.py
BatchSimilarity.py		BatchSimilarity.py
CrossMatching.py		CrossMatching.py
ExtractCnFromPDF.py		ExtractCnFromPDF.py
ExtractEnFromPDF.py		ExtractEnFromPDF.py
NewExtracterCn.py		NewExtracterCn.py
NewExtracterEn.py		NewExtracterEn.py
NewSimilarity.py		NewSimilarity.py
README.md		README.md
main.py		main.py
output_PyMiner.txt		output_PyMiner.txt
output_PyPDF2.txt		output_PyPDF2.txt
requirements.txt		requirements.txt
test_PyMiner.py		test_PyMiner.py
test_PyPDF2.py		test_PyPDF2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

extract-chinese

Getting Started

About

Releases

Packages

Languages

jwest951227/extractorChinese

Folders and files

Latest commit

History

Repository files navigation

extract-chinese

Getting Started

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages