Skip to content

Commit e4bd4b7

Browse files
committed
Update main structure
1 parent c83f2b2 commit e4bd4b7

15 files changed

+127
-566
lines changed

api-reference/available-faces.mdx

Whitespace-only changes.
+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: 'LipsyncStream WebSocket'
3+
icon: 'server'
4+
---
5+
## Overview
6+
7+
This endpoint is a websocket endpoint that takes in a stream of PCM16 audio frames and returns a stream of lipsync frames in a custom format. A custom player must be created. You can follow this document for how to decode the response.
8+
9+
## Client Flowchart
10+
11+
![LipsyncStream.png](images/LipsyncStream.png)
12+
13+
## Initial Request Format
14+
15+
The first request after initializing the websocket connection should be a JSON object with the following fields:
16+
17+
```json
18+
{
19+
"video_reference_url": "CHARACTER_VIDEO_URL",
20+
"face_det_results": "FACE_DETECTION_RESULTS",
21+
"isJPG": true,
22+
"syncAudio": true
23+
}
24+
```
25+
26+
- `video_reference_url`: The URL of the video file that the lipsync frames will be rendered on. This follows the format of `https://storage.googleapis.com/charactervideos/CHARACTER_ID/CHARACTER_ID.mp4`
27+
- `face_det_results`: The URL of the face detection results file. This follows the format of `https://storage.googleapis.com/charactervideos/CHARACTER_ID/CHARACTER_ID.pkl`
28+
- `isJPG`: A bool to encode the video frames in JPG format. If false, the frames will be sent as RAW matrix representation with shape 512x512x3 which is not recommended for most cases.
29+
- `syncAudio`: When set to true 34ms of Audio will be sent back with each frame. This is useful for syncing the audio with the video. If set to false, only the frames will be sent and you will have to sync the audio yourself.
30+
31+
## Audio Input Format
32+
33+
The audio frame is the bytes representation of the PCM16 audio frame. The audio frame is always 16000Hz mono. The lipsync will be wrong if there's a mismatch in the audio sampling rate or channel count. As of right now not other configs are possible.
34+
The number of bytes will always be an even number. For example 255ms of audio will be 16000 \* 2 \* 0.255 = 8160 bytes. The minimum audio chunk size is 250ms to avoid frequent websocket calls.
35+
36+
If there's nothing to send, you must send a zero value byte array of length 250ms \(0.25 \* 2 \* \* 16000 = 8000 bytes\) to keep receiveing frames. No input mode currently not implemented and not sending anything will break your app.
37+
38+
## Response Format
39+
40+
The response format changes based on the `syncAudio` field in the request. The response is a binary encoded format that follows this structure:
41+
42+
- 5 bytes (str): `VIDEO`
43+
- 4 bytes (int32): number of bytes in the video frame with the following format:
44+
- v bytes: Video frame with metadata
45+
- 4 bytes(int32): frame index
46+
- 4 bytes(int32): frame width
47+
- 4 bytes(int32): frame height
48+
- f bytes (byte array): video frame bytes
49+
- 5 bytes (str): `AUDIO`
50+
- 4 bytes (int32): number of bytes in the audio frame
51+
- a bytes (int16): audio frame
52+
53+
The video frame is a JPG image if `isJPG` is set to true. This means that it must be decoded accordingly depending on the language you are using. If set to false, the video frame will be a raw matrix representation with shape 512x512x3.
54+
Audio is always pcm 16bit 16000Hz mono.
55+
56+
If `syncAudio` is set to false, the response will only contain the video frame. Which is specefied in the `v bytes` field.
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
---
2-
title: 'Face IDs'
2+
title: '/getPossibleFaceIDs'
33
openapi: 'GET /getPossibleFaceIDs'
44
---

api-reference/endpoint/getPossibleVoiceIDs.mdx

-50
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'Sessions Availability'
2+
title: '/isSessionAvailable'
33
description: "API to check sessions availability before starting a session"
44
openapi: 'POST /isSessionAvailable'
55
---
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: '/startAudioToVideoSession'
3+
description: "Start a session and get it's session token"
4+
openapi: 'POST /startAudioToVideoSession'
5+
---

api-reference/endpoint/startSession.mdx

-7
This file was deleted.

api-reference/introduction.mdx

-70
This file was deleted.

api-reference/openapi.json

+1-115
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
},
88
"servers": [
99
{
10-
"url": "https://simli.com/api"
10+
"url": "https://api.simli.ai"
1111
}
1212
],
1313
"paths": {
@@ -82,120 +82,6 @@
8282
}
8383
}
8484
}
85-
},
86-
"/startSession": {
87-
"post": {
88-
"summary": "Start a session",
89-
"requestBody": {
90-
"content": {
91-
"application/json": {
92-
"schema": {
93-
"type": "object",
94-
"properties": {
95-
"apiKey": {
96-
"type": "string",
97-
"description": "Get your [API key](https://www.simli.com/sign-up-in)"
98-
},
99-
"faceId": {
100-
"type": "string",
101-
"description": "Character's face ID, see all possible face IDs [here](/api-reference/endpoint/getPossibleFaceIDs)"
102-
},
103-
"intro": {
104-
"type": "string",
105-
"description": "The character will say this at the beginning of the session"
106-
},
107-
"prompt": {
108-
"type": "string",
109-
"description": "Character interaction prompt, refer to [this guide](/quickstart#2-prompting) for more information"
110-
},
111-
"timeLimit": {
112-
"type": "object",
113-
"properties": {
114-
"limit": {
115-
"type": "integer",
116-
"description": "Session time limit in seconds, character will leave after this time"
117-
}
118-
}
119-
},
120-
"userName": {
121-
"type": "string",
122-
"description": "The character will refer to the user by this name"
123-
},
124-
"voiceId": {
125-
"type": "string",
126-
"description": "Character's voice ID, see all possible voice IDs [here](/api-reference/endpoint/getPossibleVoiceIDs)"
127-
}
128-
},
129-
"required": [
130-
"apiKey",
131-
"faceId",
132-
"intro",
133-
"prompt",
134-
"timeLimit",
135-
"userName",
136-
"voiceId"
137-
]
138-
}
139-
}
140-
}
141-
},
142-
"responses": {
143-
"200": {
144-
"description": "Session start response",
145-
"content": {
146-
"application/json": {
147-
"schema": {
148-
"type": "object",
149-
"properties": {
150-
"message": {
151-
"type": "string"
152-
},
153-
"meetingUrl": {
154-
"type": "string",
155-
"description": "Daily call url to join the session, refer to [this guide](/quickstart#4-accessing-the-meeting-url) for more information"
156-
}
157-
}
158-
}
159-
}
160-
}
161-
}
162-
}
163-
}
164-
},
165-
"/getPossibleVoiceIDs": {
166-
"get": {
167-
"summary": "Get possible voice IDs",
168-
"parameters": [
169-
{
170-
"name": "apiKey",
171-
"in": "query",
172-
"required": true,
173-
"schema": {
174-
"type": "string"
175-
}
176-
}
177-
],
178-
"responses": {
179-
"200": {
180-
"description": "List of possible voice IDs",
181-
"content": {
182-
"application/json": {
183-
"schema": {
184-
"type": "object",
185-
"properties": {
186-
"voiceIDs": {
187-
"type": "object",
188-
"additionalProperties": {
189-
"type": "string"
190-
}
191-
}
192-
}
193-
}
194-
}
195-
}
196-
}
197-
}
198-
}
19985
}
20086
}
20187
}

api-reference/simli-react-sdk.mdx

Whitespace-only changes.

images/LipsyncStream.png

59.5 KB
Loading

0 commit comments

Comments
 (0)