Skip to content

Commit abfcdf1

Browse files
committed
Improve readme: clarify dependencies and other things to install
1 parent 4e23ad8 commit abfcdf1

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

README.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,8 @@ There is also an even better 110M param model available, see [models](#models).
3838

3939
## Meta's Llama 2 models
4040

41-
As the neural net architecture is identical, we can also inference the Llama 2 models released by Meta. Sadly there is a bit of friction here due to licensing (I can't directly upload the checkpoints, I think). So Step 1, get the Llama 2 checkpoints by following the [Meta instructions](https://github.com/facebookresearch/llama). Once we have those checkpoints, we have to convert them into the llama2.c format. For this we use the `export_meta_llama_bin.py` file, e.g. for 7B model:
41+
As the neural net architecture is identical, we can also inference the Llama 2 models released by Meta. Sadly there is a bit of friction here due to licensing (I can't directly upload the checkpoints, I think). So Step 1, get the Llama 2 checkpoints by following the [Meta instructions](https://github.com/facebookresearch/llama). Once we have those checkpoints, we have to convert them into the llama2.c format.
42+
For this we need to install the python dependencies (`pip install -r requirements.txt`) and then use the `export_meta_llama_bin.py` file, e.g. for 7B model:
4243

4344
```bash
4445
python export_meta_llama_bin.py path/to/llama/model/7B llama2_7b.bin
@@ -50,7 +51,7 @@ The export will take ~10 minutes or so and generate a 26GB file (the weights of
5051
./run llama2_7b.bin
5152
```
5253

53-
This ran at about 4 tokens/s compiled with OpenMP on 96 threads on my CPU Linux box in the cloud. (On my MacBook Air M1, currently it's closer to 30 seconds per token if you just build with `make runfast`.) Example output:
54+
This ran at about 4 tokens/s compiled with [OpenMP](#OpenMP) on 96 threads on my CPU Linux box in the cloud. (On my MacBook Air M1, currently it's closer to 30 seconds per token if you just build with `make runfast`.) Example output:
5455

5556
> The purpose of this document is to highlight the state-of-the-art of CoO generation technologies, both recent developments and those in commercial use. The focus is on the technologies with the highest merit to become the dominating processes of the future and therefore to be technologies of interest to S&T ... R&D. As such, CoO generation technologies developed in Russia, Japan and Europe are described in some depth. The document starts with an introduction to cobalt oxides as complex products and a short view on cobalt as an essential material. The document continues with the discussion of the available CoO generation processes with respect to energy and capital consumption as well as to environmental damage.
5657
@@ -141,7 +142,9 @@ gcc -Ofast -o run run.c -lm
141142

142143
You can also experiment with replacing `gcc` with `clang`.
143144

144-
**OpenMP** Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention. You can compile e.g. like so:
145+
### OpenMP
146+
Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors.
147+
You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). Then you can compile e.g. like so:
145148

146149
```bash
147150
clang -Ofast -fopenmp -march=native run.c -lm -o run

0 commit comments

Comments
 (0)