Skip to content

Commit

Permalink
Add docs explaining the streaming mode (Lightning-AI#1484)
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt authored Jun 13, 2024
1 parent 5766ea9 commit 70a818f
Showing 1 changed file with 29 additions and 2 deletions.
31 changes: 29 additions & 2 deletions tutorials/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This section illustrates how we can set up an inference server for a phi-2 LLM u


 
## Step 1: Start the inference server
### Step 1: Start the inference server


```bash
Expand All @@ -25,7 +25,7 @@ litgpt serve microsoft/phi-2

 
## Step 2: Query the inference server
### Step 2: Query the inference server

You can now send requests to the inference server you started in step 2. For example, in a new Python session, we can send requests to the inference server as follows:

Expand All @@ -46,3 +46,30 @@ Executing the code above prints the following output:
```
Example input.
```

 
## Optional streaming mode

The 2-step procedure described above returns the complete response all at once. If you want to stream the response on a token-by-token basis, start the server with the streaming option enabled:

```bash
litgpt serve microsoft/phi-2 --stream true
```

Then, use the following updated code to query the inference server:

```python
import requests, json

response = requests.post(
"http://127.0.0.1:8000/predict",
json={"prompt": "Fix typos in the following sentence: Exampel input"},
stream=True
)

print(response.json()["output"])
```

```
b'{"output": "The"}'b'{"output": " corrected"}'b'{"output": " sentence"}'b'{"output": " is"}'b'{"output": ":"}'b'{"output": " Example"}'b'{"output": " input"}'
```

0 comments on commit 70a818f

Please sign in to comment.