Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加新模型,并解决部分现有问题 #28

Merged
merged 11 commits into from
Dec 14, 2024
Prev Previous commit
Next Next commit
替换pad_sequences为HuggingFaceTokenizer的自动填充功能
  • Loading branch information
DeBruyxuan committed Nov 23, 2024
commit dee688622e2a1aa231468a8579fa499997088747
7 changes: 3 additions & 4 deletions backend/fsl/src/added/bert_finetune.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,9 @@ def accuracy(labels, preds):
sentences_tokens = [tokenizer.tokenize(sen) for sen in sentences]
# token --> id
sentence_ids = [tokenizer.convert_tokens_to_ids(sen) for sen in sentences_tokens]
# 将所有语句的长度固定到 max_len
sentence_ids = pad_sequences(sentence_ids, maxlen=max_len, dtype='long', truncating='post', padding='post')
# 根据 sentence_ids 创建 attention mask
attention_mask = [[1 if id > 0 else 0 for id in sen] for sen in sentence_ids]
encoded_data = tokenizer(sentences, padding="max_length", truncation=True, max_length=max_len, return_tensors="pt")
sentence_ids = encoded_data["input_ids"]
attention_mask = encoded_data["attention_mask"]
print(attention_mask[0])
# 数据类型转换,全部转换为 tensor
sentence_ids = torch.tensor(sentence_ids)
Expand Down