Skip to content

Latest commit

 

History

History
138 lines (110 loc) · 6.35 KB

QuickTour.md

File metadata and controls

138 lines (110 loc) · 6.35 KB

Quick tour

Quckly use

We provide many models which are trained to perform different tasks. You can load these models by AutoLoader to make prediction.

Ner task

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

task_name = "sequence-labeling"
model_name = "roberta_ner"
target = ["O", "B-LOC", "I-LOC", "B-ORG", "I-ORG", "B-PER", "I-PER"]
maxlen = 256

auto_loader = AutoLoader(task_name,
                         model_name=model_name,
                         load_pretrain_params=True,
                         target_size=len(target))

model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

predictor = Predictor(model, tokenizer)

test_data = [
    "6月15日,河南省文物考古研究所曹操高陵文物队公开发表声明承认:“从来没有说过出土的珠子是墓主人的",
    "4月8日,北京冬奥会、冬残奥会总结表彰大会在人民大会堂隆重举行。习近平总书记出席大会并发表重要讲话。在讲话中,总书记充分肯定了北京冬奥会、冬残奥会取得的优异成绩,全面回顾了7年筹办备赛的不凡历程,深入总结了筹备举办北京冬奥会、冬残奥会的宝贵经验,深刻阐释了北京冬奥精神,对运用好冬奥遗产推动高质量发展提出明确要求。",
    "当地时间8日,欧盟委员会表示,欧盟各成员国政府现已冻结共计约300亿欧元与俄罗斯寡头及其他被制裁的俄方人员有关的资产。",
    "这一盘口状态下英国必发公司亚洲盘交易数据显示博洛尼亚热。而从欧赔投注看,也是主队热。巴勒莫两连败,",
]

for t in test_data:
    entities = predictor.predict_ner(t, target, maxlen=maxlen)
    result = {}
    for e in entities:
        if e[2] not in result:
            result[e[2]] = [t[e[0]:e[1] + 1]]
        else:
            result[e[2]].append(t[e[0]:e[1] + 1])
    print(f"result is {result}")

Bert title generation task

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
maxlen = 256

auto_loader = AutoLoader(
    task_name="seq2seq",
    model_name="roberta_title_generation",
    load_pretrain_params=True,
)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

predictor = Predictor(model, tokenizer)

test_data = [
    "本文总结了十个可穿戴产品的设计原则而这些原则同样也是笔者认为是这个行业最吸引人的地方1为人们解决重复性问题2从人开始而不是从机器开始3要引起注意但不要刻意4提升用户能力而不是取代人",
    "2007年乔布斯向人们展示iPhone并宣称它将会改变世界还有人认为他在夸大其词然而在8年后以iPhone为代表的触屏智能手机已经席卷全球各个角落未来智能手机将会成为真正的个人电脑为人类发展做出更大的贡献",
    "雅虎发布2014年第四季度财报并推出了免税方式剥离其持有的阿里巴巴集团15%股权的计划打算将这一价值约400亿美元的宝贵投资分配给股东截止发稿前雅虎股价上涨了大约7%至5145美元"
]

for text in test_data:
    print(
        predictor.predict_generate_beamsearch(text,
                                              out_max_length=50,
                                              beam_size=3))

Semantic matching task

import torch
from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.predictor import Predictor

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
maxlen = 256

auto_loader = AutoLoader("classification",
                         model_name="roberta-base-ch-semantic-matching",
                         load_pretrain_params=True,
                         target_size=2)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

predictor = Predictor(model, tokenizer)

test_data = [["后悔了吗", "你有没有后悔"], ["打开自动横屏", "开启移动数据"],
             ["我觉得你很聪明", "你聪明我是这么觉得"]]

for text_pair in test_data:
    print(predictor.predict_cls_classifier(text_pair))

Load model and tokenizer

We provide the AutoLoad class to load the model and tokenizer quickly, for example:

from flagai.auto_model.auto_loader import AutoLoader
auto_loader = AutoLoader(task_name="classification",
                         model_name="BERT-base-ch",
                         load_pretrain_params=True,
                         target_size=2)
model = auto_loader.get_model()
tokenizer = auto_loader.get_tokenizer()

This example is for the Classification task, and you can also model other tasks by modifying the task_name. Target_size is the number of categories for a classification task. Then you can use the model and tokenizer to finetune or test.

Predictor

We provide the Predictor class to predict for different tasks, for example:

predictor = Predictor(model, tokenizer)
test_data = [
    "本文总结了十个可穿戴产品的设计原则而这些原则同样也是笔者认为是这个行业最吸引人的地方1为人们解决重复性问题2从人开始而不是从机器开始3要引起注意但不要刻意4提升用户能力而不是取代人",
    "2007年乔布斯向人们展示iPhone并宣称它将会改变世界还有人认为他在夸大其词然而在8年后以iPhone为代表的触屏智能手机已经席卷全球各个角落未来智能手机将会成为真正的个人电脑为人类发展做出更大的贡献",
    "雅虎发布2014年第四季度财报并推出了免税方式剥离其持有的阿里巴巴集团15%股权的计划打算将这一价值约400亿美元的宝贵投资分配给股东截止发稿前雅虎股价上涨了大约7%至5145美元"
]
for text in test_data:
    print(
        predictor.predict_generate_beamsearch(text,
                                              out_max_length=50,
                                              beam_size=3))

This example is for the seq2seq task, we can get the beam-search result by calling the "predict_generate_beamsearch" fuction.In addition, we also support the prediction of tasks such as NER and classification.