Skip to content

Commit

Permalink
add ASR inference
Browse files Browse the repository at this point in the history
  • Loading branch information
78 committed Sep 7, 2024
1 parent 1c891b5 commit 4cf4a06
Show file tree
Hide file tree
Showing 5 changed files with 394 additions and 128 deletions.
132 changes: 4 additions & 128 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,130 +1,6 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
.pnpm-debug.log*

# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json

# Runtime data
pids
*.pid
*.seed
*.pid.lock

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage
*.lcov

# nyc test coverage
.nyc_output

# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# Bower dependency directory (https://bower.io/)
bower_components

# node-waf configuration
.lock-wscript

# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release

# Dependency directories
node_modules/
jspm_packages/

# Snowpack dependency directory (https://snowpack.dev/)
web_modules/

# TypeScript cache
*.tsbuildinfo

# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache

# Optional stylelint cache
.stylelintcache

# Microbundle cache
.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/

# Optional REPL history
.node_repl_history

# Output of 'npm pack'
*.tgz

# Yarn Integrity file
.yarn-integrity

# dotenv environment variable files
tmp/
.env
.env.development.local
.env.test.local
.env.production.local
.env.local

# parcel-bundler cache (https://parceljs.org/)
.cache
.parcel-cache

# Next.js build output
.next
out

# Nuxt.js build / generate output
.nuxt
dist

# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and not Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public

# vuepress build output
.vuepress/dist

# vuepress v2.x temp and cache directory
.temp
.cache

# Docusaurus cache and generated files
.docusaurus

# Serverless directories
.serverless/

# FuseBox cache
.fusebox/

# DynamoDB Local files
.dynamodb/

# TernJS port file
.tern-port

# Stores VSCode versions used for testing VSCode extensions
.vscode-test

# yarn v2
.yarn/cache
.yarn/unplugged
.yarn/build-state.yml
.yarn/install-state.gz
.pnp.*
package-lock.json
TODO.md
*.pyc
79 changes: 79 additions & 0 deletions inference/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# ASR Task Client

这个服务用于接收语音识别任务,并将识别结果发送给调用方。

可以运行于没有公网 IP 的环境,通过 WebSocket 连接到任务分发服务器,以便于快速扩容。


### 使用到以下开源模型:

VAD:<a href="https://modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch">speech_fsmn_vad_zh-cn-16k-common-pytorch</a>

ASR:<a href="https://modelscope.cn/models/iic/SenseVoiceSmall">SenseVoiceSmall</a>

Embedding:<a href="https://modelscope.cn/models/iic/speech_eres2netv2w24s4ep4_sv_zh-cn_16k-common">ERes2NetV2_w24s4ep4</a>

### 效率:

VAD < 1ms,ASR < 50ms,Embedding < 50ms


## 硬件要求

GPU 显存:4GB

使用 CPU 增加 5 ~ 10 倍的请求耗时。

## 运行环境

需要配置环境变量

ASR_TASK_SERVER_URL=wss://



```bash
conda create -n xiaozhi python=3.12
codna activate xiaozhi

pip install -r requirements.txt
python asr_task_client.py
```

## WebSocket 协议

### 发起请求

JSON:

1、detect 开始检测任务

```json
{ "session_id": "xxx", "type": "detect", "words": "小智" }
```

2、finish 完成检测任务

```json
{ "session_id": "xxx", "type": "finish" }
```

二进制:

每次发送的二进制数据包含 session_id 和 PCM 音频数据两部分,格式如下:

```
session_id length,uint32 big
session_id 字符串
pcm length,uint32 big
pcm data
```

### 接收结果

JSON:

```json
{ "session_id": "xxx", "type": "reply", "content": "文本内容", "embedding": "音频向量", "url": "音频下载地址" }
```

Loading

0 comments on commit 4cf4a06

Please sign in to comment.