Skip to content

Commit

Permalink
add test sets from open-sourced dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
dophist committed Aug 7, 2021
1 parent d73d4ea commit 69bf7bc
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 9 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
SAFEBOX/*.cfg
datasets/download
datasets/SPEECHIO_*
datasets/*_TEST
oss
tmp.sh
test_env
Expand Down
10 changes: 5 additions & 5 deletions HOW_TO_SUBMIT.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,18 @@ As above figure demonstrates, a benchmark cycle contains following steps:

---

## Step 1. Prepare model dir for submission
## Step 1. Prepare submission model dir

Conceptually, for leaderboard to re-produce and benchmark submitters' ASR system, submitters need to provide at least 3 things:
* system dependencies (operation system, softwares, libraries, packages)
* runtime resources (e.g. model, config, cloud-api credentials)
* a program that can decode local audio list

A sample submission `model directory` is listed below:
So the main purpose of leaderboard pipeline is to formalize above aspects down to a concrete contract. Let's start with `submission model dir`:
```
jiayu@ubuntu: tree ./sample_model_directory
jiayu@ubuntu: tree ./sample_submission_model_dir
sample_model_directory
sample_submission_model_dir
├── docker
│ └── Dockerfile
├── model.yaml
Expand All @@ -33,7 +33,7 @@ sample_model_directory
---

### 1.1 `docker/Dockerfile`
Dockerfile is used to construct your runtime envrionment for benchmarking, it should specifies all dependencies of your ASR system.
Dockerfile serves as a specification of your ASR runtime environment, pipeline will build docker image to reproduce your system on local machine. Here, `runtime` can be a cloud-API client, or a local ASR engine.

<details><summary> cloud-API ASR Dockerfile example </summary><p>

Expand Down
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,20 @@ With SpeechIO leaderboard, anyone can benchmark/reproduce/compare performances w
---

## 2. TestSet Zoo
<details><summary> Test Sets (ZH) </summary><p>
<details><summary> Test Sets from Open-Sourced Dataset (ZH) </summary><p>

| 编号 <br> ID | 名称 <br> Name |场景 <br> Scenario | 内容领域 <br> Topic Domain | 时长 <br> hours | 难度(1-5) <br> Difficulty |
| 编号 <br> TEST_SET_ID | 说明 <br> DESCRIPTION |
| --- | --- |
| AISHELL-1_TEST | test set of AISHELL-1 |
| AISHELL-2_IOS_TEST | test set of AISHELL-2 (iOS channel) |
| AISHELL-2_ANDROID_TEST | test set of AISHELL-2 (Android channel) |
| AISHELL-2_MIC_TEST | test set of AISHELL-2 (Microphone channel) |

</p></details>

<details><summary> SpeechIO Test Sets (ZH) </summary><p>

| 编号 <br> TEST_SET_ID | 名称 <br> Name |场景 <br> Scenario | 内容领域 <br> Topic Domain | 时长 <br> hours | 难度(1-5) <br> Difficulty |
| --- | --- | --- | --- | --- | --- |
|SPEECHIO_ASR_ZH00000| 接入调试集 <br> For leaderboard submitter debugging | 视频会议、论坛演讲 <br> video conference & forum speech | 经济、货币、金融 <br> economy, currency, finance | 1.0 | ★★☆ |
|SPEECHIO_ASR_ZH00001| 新闻联播 | 新闻播报 <br> TV News | 时政 <br> news & politics | 9 ||
Expand Down Expand Up @@ -68,7 +79,7 @@ With SpeechIO leaderboard, anyone can benchmark/reproduce/compare performances w
---

## 3. Model Zoo
<details><summary> Commercial Models (ZH) </summary><p>
<details><summary> Commercial API (ZH) </summary><p>

| 编号 <br> MODEL_ID | 类型 <br> type | 模型作者/所有人 <br> model author/owner | 简介 <br> description | 链接 <br> url |
| --- | --- | --- | --- | --- |
Expand All @@ -82,7 +93,7 @@ With SpeechIO leaderboard, anyone can benchmark/reproduce/compare performances w
|yitu_api | Cloud API |依图 <br> YituTech |依图语音开放平台| https://speech.yitutech.com |
</p></details>

<details><summary> Open-Sourced Models (ZH) </summary><p>
<details><summary> Open-Sourced Pretrained Models (ZH) </summary><p>

| 编号 <br> MODEL_ID | 类型 <br> type | 模型作者/所有人 <br> model author/owner | 简介 <br> description | 链接 <br> url |
| --- | --- | --- | --- | --- |
Expand Down
9 changes: 9 additions & 0 deletions datasets/run_kaldi_to_speechio.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
# mini debug test set
./kaldi_to_speechio.py download/MINI MINI

# open-sourced test sets
./kaldi_to_speechio.py download/AISHELL-1_test AISHELL-1_TEST
./kaldi_to_speechio.py download/AISHELL-2_iOS_test AISHELL-2_IOS_TEST
./kaldi_to_speechio.py download/AISHELL-2_Android_test AISHELL-2_ANDROID_TEST
./kaldi_to_speechio.py download/AISHELL-2_Mic_test AISHELL-2_MIC_TEST

# SpeechIO test sets
./kaldi_to_speechio.py download/economy_finance_currency SPEECHIO_ASR_ZH00000
./kaldi_to_speechio.py download/cctv_news SPEECHIO_ASR_ZH00001
./kaldi_to_speechio.py download/luyu_yirixing SPEECHIO_ASR_ZH00002
Expand Down

0 comments on commit 69bf7bc

Please sign in to comment.