Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Signed-off-by: AT <[email protected]>
  • Loading branch information
manyoso authored May 14, 2023
1 parent 8f3c884 commit 171eee4
Showing 1 changed file with 3 additions and 38 deletions.
41 changes: 3 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<h1 align="center">GPT4All</h1>
<p align="center">Open-source assistant-style large language models that run locally on CPU</p>
<p align="center">Open-source assistant-style large language models that run locally on your CPU</p>

<p align="center">
<a href="https://gpt4all.io">GPT4All Website</a>
Expand Down Expand Up @@ -80,7 +80,6 @@ If you have older hardware that only supports avx and not avx2 you can use these

* [Ubuntu - avx-only](https://gpt4all.io/installers/gpt4all-installer-linux-avx-only.run)


Find the most up-to-date information on the [GPT4All Website](https://gpt4all.io/)

### Chat Client building and running
Expand All @@ -93,43 +92,9 @@ Find the most up-to-date information on the [GPT4All Website](https://gpt4all.io
* <a href="https://github.com/nomic-ai/gpt4all/tree/main/gpt4all-bindings/typescript">:computer: Official Typescript Bindings</a>


## Training GPT4All-J

Please see [GPT4All-J Technical Report](https://static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf) for details.

### GPT4All-J Training Data

- We are releasing the curated training data for anyone to replicate GPT4All-J here: [GPT4All-J Training Data](https://huggingface.co/datasets/nomic-ai/gpt4all-j-prompt-generations)
- [Atlas Map of Prompts](https://atlas.nomic.ai/map/gpt4all-j-prompts-curated)
- [Atlas Map of Responses](https://atlas.nomic.ai/map/gpt4all-j-response-curated)

We have released updated versions of our `GPT4All-J` model and training data.

- `v1.0`: The original model trained on the v1.0 dataset
- `v1.1-breezy`: Trained on a filtered dataset where we removed all instances of AI language model
- `v1.2-jazzy`: Trained on a filtered dataset where we also removed instances like I'm sorry, I can't answer... and AI language model

The [models](https://huggingface.co/nomic-ai/gpt4all-j) and [data](https://huggingface.co/datasets/nomic-ai/gpt4all-j-prompt-generations) versions can be specified by passing a `revision` argument.

For example, to load the `v1.2-jazzy` model and dataset, run:

```python
from datasets import load_dataset
from transformers import AutoModelForCausalLM

dataset = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision="v1.2-jazzy")
model = AutoModelForCausalLM.from_pretrained("nomic-ai/gpt4all-j-prompt-generations", revision="v1.2-jazzy")
```

### GPT4All-J Training Instructions

```bash
accelerate launch --dynamo_backend=inductor --num_processes=8 --num_machines=1 --machine_rank=0 --deepspeed_multinode_launcher standard --mixed_precision=bf16 --use_deepspeed --deepspeed_config_file=configs/deepspeed/ds_config_gptj.json train.py --config configs/train/finetune_gptj.yaml
```

## Contributing
GPT4All welcomes contribution, involvment, and discussion from the open source community!
Please see CONTRIBUTING.md and follow the issue, bug report, and PR markdown templates.
GPT4All welcomes contributions, involvement, and discussion from the open source community!
Please see CONTRIBUTING.md and follow the issues, bug reports, and PR markdown templates.

Check project discord, with project owners, or through existing issues/PRs to avoid duplicate work.
Please make sure to tag all of the above with relevant project identifiers or your contribution could potentially get lost.
Expand Down

0 comments on commit 171eee4

Please sign in to comment.