Skip to content

The scripts include: Text segmentation into sentences, remove numbers, diacritics, replacated carracters, non arabic words, ponctuatons, etc

Notifications You must be signed in to change notification settings

Nagoudi/Clean-and-Segmentation-of-Arabic-Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Input : Input.txt (the encoding must be UTF 8 ) Output: filename.txt with "SEG_" prefix on the name (ex: filename.txt ---> SEG_filename.txt)

Note: * The output file location will be at the script's same path

   **  The is intended for Python 3

   *** The script.py must be in the same folder with Seg_clean.py

About

The scripts include: Text segmentation into sentences, remove numbers, diacritics, replacated carracters, non arabic words, ponctuatons, etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages