Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
requirements.txt		requirements.txt

Repository files navigation

Alpaca Libre

Small research project - how much it would cost to create Alpaca-like dataset using slightly different approach. All data byproducts are CC0-licensed.

Remember that developing a model based on data you generated via model API might violate the terms of service of the model API provider.

Usage

Clone the repo: git clone https://github.com/mobarski/alpaca-libre && cd alpaca-libre
Install required python modules: pip install -r requirements.txt
View / edit generate.py
Set API_KEY: export OPENAI_KEY=...
Run the script: python3 generate.py

Attribution

data/seed_tasks.jsonl - is from the Self-Instruct paper
data/alpaca_libre_prompt_v1.txt - is from the Alpaca paper (with slight modfification)

Output

The output file is in the jsonl format. It contains one task (json object) per line. Each task object has the following items:

status - anything other than 'ok' is bad
instruction
input
output
other

References

GitHub repos:

Papers:

Changelog

0.3
- parallel main loop
- better cli output
- output format change (everythig not essential is placed in the "other" object)
- basic output quality check
- fix: multiline input/output handling
- fix: no initial space / empty section handling
- fix:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alpaca Libre

Usage

Attribution

Output

References

Changelog

About

Languages

License

mobarski/alpaca-libre

Folders and files

Latest commit

History

Repository files navigation

Alpaca Libre

Usage

Attribution

Output

References

Changelog

About

Resources

License

Stars

Watchers

Forks

Languages