Name		Name	Last commit message	Last commit date
parent directory ..
ModelTesting		ModelTesting
sentiment		sentiment
word2vec		word2vec
README.md		README.md

README.md

NLPLab

Welcome to : https://chunshan-theta.github.io/NLPLab/

researching the core of nlp. contant: [email protected]

sentiment:

implement of sentiment model in Chinese

word2vec:

implement of word2vec model

word2vec/train_by_article.py

1. loading the train data
	>  讀取停用字 loading stop words ( word2vec/stop_words.txt.py )
	>  loading training article ( word2vec/wiki/ or word2vec/TextForTrain/ )

	the main step:
	* clear special character:only chinese
	* simplified to treditional (../nstools/)

2. Build the dictionary and replace rare words with UNKNOWWORD token.
	>  Build the dictionary
	>  rare words processed
	
	the main step:
	* Setting the size of the word set for the training model
	* using function: collections.Counter().most_common()
3. Function to generate a training batch for the skip-gram model.
4. Build and train a skip-gram model.
	> Loss: tf.nn.nce_loss()
	> Optimizer: tf.train.AdamOptimizer(learning_rate=1.0).minimize()
5. Begin training
	> training stage
	> TensorBoard (will output to word2vec/TB/)
	> output to Json txt file :result_Json

Crawler:

Getting the date from website:
	1:scientific article
	2:Positive and negatiave review

ModelTesting:

the testing of model of nlp

jieba_zn:

Setting for traditional Chinese

nstools:

converting simplified to traditional

TextRank:

implement of TextRank model

tf-idf-shortstr:

implement of tf-idf model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

class_model

class_model

README.md

NLPLab

sentiment:

word2vec:

word2vec/train_by_article.py

Crawler:

ModelTesting:

jieba_zn:

nstools:

TextRank:

tf-idf-shortstr:

reference

Files

class_model

Directory actions

More options

Directory actions

More options

Latest commit

History

class_model

Folders and files

parent directory

README.md

NLPLab

sentiment:

word2vec:

word2vec/train_by_article.py

Crawler:

ModelTesting:

jieba_zn:

nstools:

TextRank:

tf-idf-shortstr:

reference