Skip to content

Commit

Permalink
Merge pull request HIT-SCIR#259 from HIT-SCIR/develop
Browse files Browse the repository at this point in the history
fix bugs and typo mistakes
  • Loading branch information
liu946 authored Oct 23, 2017
2 parents 1b217aa + bc742ed commit a6c1b0e
Show file tree
Hide file tree
Showing 15 changed files with 84 additions and 181 deletions.
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,19 @@
新闻
----

语言云全面改用`HTTPS`协议
* 由于安全需求,目前改仓库的云接口平台 [语言技术平台云](https://www.ltp-cloud.com/) 已经全面换用`HTTPS`协议访问。
* 之前使用云平台接口的用户,请切换到`HTTPS`协议访问、调用接口。
* 如果遇到平台网页不能打开的情况,可以清理浏览器缓存解决。
* 在迁移期间导致部分新注册用户`apikey`不能使用情况,请联系管理员。

语言技术平台官网上线
* [语言技术平台官网](http://ltp.ai/)近期上线。
* 可访问官网 [查看文档](http://ltp.ai/docs/index.html)[下载模型](http://ltp.ai/download.html)[体验在线Demo](http://ltp.ai/demo.html)

模型切换到七牛源
* 最近很多用户反映使用百度云下载模型非常缓慢,现已切换到七牛云,请访问[该链接](http://ltp.ai/download.html)选择版本下载。

语言技术平台3.4.0版 发布
* [增加] 新的基于Bi-LSTM的SRL模型
* [增加] 增加了SRL的多线程命令行程序`srl_cmdline`
Expand Down Expand Up @@ -47,12 +60,12 @@
文档
---

关于LTP的使用,请参考[在线文档](http://ltp.readthedocs.io/zh_CN/latest/)
关于LTP的使用,请参考[在线文档](http://ltp.ai/docs/index.html)

模型
---

* [百度云](http://pan.baidu.com/share/link?shareid=1988562907&uk=2738088569)
* [所有版本下载链接](http://ltp.ai/download.html)
* 当前模型版本 3.4.0

其它语言接口
Expand Down
10 changes: 5 additions & 5 deletions src/console/ltp_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ int main(int argc, char *argv[]) {
"The path to the NER model [default=ltp_data/ner.model].")
("parser-model", value<std::string>(),
"The path to the parser model [default=ltp_data/parser.model].")
("srl-data", value<std::string>(),
"The path to the SRL model directory [default=ltp_data/srl_data/].")
("srl-model", value<std::string>(),
"The path to the SRL model [default=ltp_data/pisrl.model].")
("debug-level", value<int>(), "The debug level.")
("help,h", "Show help information");

Expand Down Expand Up @@ -184,13 +184,13 @@ int main(int argc, char *argv[]) {
parser_model= vm["parser-model"].as<std::string>();
}

std::string srl_data= "ltp_data/srl/";
std::string srl_model= "ltp_data/pisrl.model";
if (vm.count("srl-data")) {
srl_data = vm["srl-data"].as<std::string>();
srl_model = vm["srl-data"].as<std::string>();
}

LTP engine(last_stage, segmentor_model, segmentor_lexicon, postagger_model,
postagger_lexcion, ner_model, parser_model, srl_data);
postagger_lexcion, ner_model, parser_model, srl_model);

if (!engine.loaded()) {
std::cerr << "Failed to load LTP" << std::endl;
Expand Down
5 changes: 3 additions & 2 deletions src/console/srl_cmdline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,10 @@ int main(int argc, char ** argv) {
("input", value<std::string>(), "The path to the input file. "
"Input data should contain one word each line. "
"Sentence should be separated by a blank line. "
"! Note that index start at 0, and index(ROOT)=-1 in the thrid column. "
"(e.g. \"中国 ns 2 ATT\").")
("pisrl-model", value<std::string>(),
"The path to the pi-srl joint model [default=ltp_data/pos.model].")
"The path to the pi-srl joint model [default=ltp_data/pisrl.model].")
("help,h", "Show help information");

if (argc == 1) {
Expand Down Expand Up @@ -130,7 +131,7 @@ int main(int argc, char ** argv) {
std::string input = "";
if (vm.count("input")) { input = vm["input"].as<std::string>(); }

std::string srl_model = "ltp_data/pos.model";
std::string srl_model = "ltp_data/pisrl.model";
if (vm.count("pisrl-model")) {
srl_model = vm["pisrl-model"].as<std::string>();
}
Expand Down
24 changes: 23 additions & 1 deletion src/segmentor/segmentor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -233,10 +233,32 @@ void Segmentor::load_lexicon(const char* filename, Model::lexicon_t* lexicon) co
std::ifstream ifs(filename);
if (!ifs.good()) { return; }
std::string line;
bool updated;
std::string full;
std::string tmp;
while (std::getline(ifs, line)) {
trim(line);
std::string form = line.substr(0, line.find_first_of(" \t"));
lexicon->set(form.c_str(), true);
updated = false;
for(int index=0; index<form.size();) {
if((form[index] & 0x80) == 0) {
if(!updated)
full = form.substr(0, index);
strutils::chartypes::sbc2dbc(form.substr(index, 1), tmp);
full += tmp;
index += 1;
updated = true;
} else if ((form[index] & 0xE0) == 0xC0) index += 2;
else if ((form[index] & 0xF0) == 0xE0) index += 3;
else if ((form[index] & 0xF8) == 0xF0) index += 4;
else if ((form[index] & 0xFC) == 0xF8) index += 5;
else if ((form[index] & 0xFE) == 0xFC) index += 6;
else {
ERROR_LOG("Unknown character prefix : 0x%x @ %s\n", form[index], form.c_str());
continue;
}
}
lexicon->set(updated?full.c_str():form.c_str(), true);
}
INFO_LOG("loaded %d lexicon entries", lexicon->size());
}
Expand Down
6 changes: 4 additions & 2 deletions src/server/ltp_server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -521,10 +521,12 @@ static int Service(struct mg_connection *conn) {
TRACE_LOG("Analysis is done.");

std::string strResult;
if (str_format == "xml") {
if (str_format == "xml") { //xml
xml4nlp.SaveDOM(strResult);
} else { //json
} else if (str_format == "json") { //json
strResult = xml2jsonstr(xml4nlp, str_type);
} else { // if str_format not set, or is invalid, use xml
xml4nlp.SaveDOM(strResult);
}


Expand Down
7 changes: 6 additions & 1 deletion src/srl/DepSRL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

using namespace std;

std::mutex DepSRL::mtx;

// Load necessary resources into memory
int DepSRL::LoadResource(const string &modelFile)
{
Expand Down Expand Up @@ -69,22 +71,25 @@ int DepSRL::GetSRLResult(
sentence.push_back(word);
}
// pi prediction
mtx.lock();
{
ComputationGraph hg;
vector<Expression> adists = pi_model->label(hg, sentence);
pi_model->ExtractResults(hg, adists, sentence);
}
mtx.unlock();
if ( !sentence.getPredicateList().size() ) {
// skip all processing if no predicate
return 0;
}
// srl prediction
mtx.lock();
{
ComputationGraph hg;
vector<Expression> adists = srl_model->label(hg, sentence);
srl_model->ExtractResults(hg, adists, sentence);
}

mtx.unlock();
if (!FormResult(words, POSs, sentence.getPredicateList(), sentence, vecSRLResult))
return -1;
return 0;
Expand Down
2 changes: 2 additions & 0 deletions src/srl/DepSRL.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "Pi/model/SrlPiModel.h"
#include "Srl/model/SrlSrlModel.h"
#include "structure/WordEmbBuilder.h"
#include "mutex"

class DepSRL {

Expand Down Expand Up @@ -93,6 +94,7 @@ class DepSRL {
SrlSrlModel * srl_model;
PiModel * pi_model;
unordered_map<string, vector<float>> embedding;
static std::mutex mtx; // to fix dynet single CG constrain.
private:
void manageConfigPath(ModelConf &config, const string &dirPath);

Expand Down
7 changes: 6 additions & 1 deletion tools/autotest/request.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ def NormalTest():
optparser.add_option("--case", dest="error_cases", action="store_true",
default=False, help="specify case test")
optparser.add_option("--file", dest="filename", help="specify the file")
optparser.add_option("--sentence-batch", dest="stn_batch", help="stns in one request", default=1)
opts, args = optparser.parse_args()

if opts.error_cases:
Expand All @@ -112,8 +113,12 @@ def TEST(function, name):
print >> sys.stderr, "Failed to open file, use stdin instead"
fp=sys.stdin

buffer=[]
for line in fp:
try:
print Request(line.strip(), 'n')
buffer.append(line.strip())
if len(buffer) == opts.stn_batch:
print Request('\n'.join(buffer), 'n')
buffer=[]
except Exception, e:
print e
116 changes: 0 additions & 116 deletions tools/train/conf/srl/assets/Chinese.xml

This file was deleted.

20 changes: 0 additions & 20 deletions tools/train/conf/srl/assets/srl.cfg

This file was deleted.

12 changes: 0 additions & 12 deletions tools/train/conf/srl/srl-prg.cnf

This file was deleted.

14 changes: 0 additions & 14 deletions tools/train/conf/srl/srl-srl.cnf

This file was deleted.

5 changes: 0 additions & 5 deletions tools/train/conf/srl/srl-test.cnf

This file was deleted.

9 changes: 9 additions & 0 deletions tools/train/sample/srl/example-pi.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
model=pi.model
activate=tanh
word_dim=100
emb_dim=200
pos_dim=20
rel_dim=50
lstm_input_dim=100
lstm_hidden_dim=200
layers=1
Loading

0 comments on commit a6c1b0e

Please sign in to comment.