-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Personal copilot blog #1413
Personal copilot blog #1413
Conversation
personal_copilot.md
Outdated
@@ -0,0 +1,676 @@ | |||
--- | |||
title: "HugCoder 🤗: Train Your Own Coding Assistant 🚀" | |||
thumbnail: /blog/assets/159_safecoder/thumbnail.jpg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be updated. And an entry needs to be added to "_blog.yml".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple more comments on top of the already great comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super interesting! I only did a first pass. I agree with @sayakpaul and @BenjaminBossan that the memory computations and training approach (QLoRA vs full fine-tuning) might require a bit more hand-holding.
We potentially don't need to show results from all the experiments you did. For example, we can recommend QLoRA as the cheapest and fastest method, and direct interested readers to the traditional fine-tuning scripts.
personal_copilot.md
Outdated
|
||
Voila! ⭐️ | ||
|
||
The demo at the start is this 1B model that is running locally on my Mac laptop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many tokens per second are you getting? I think it'd be interesting for the community as it's an usual comparison metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great and comprehensive blog post! Previous comments have already covered most points.
personal_copilot.md
Outdated
|
||
To keep the serialization of this content relatively memory-friendly, we used chunking and the feather format. Refer to [this script](https://github.com/sayakpaul/hf-codegen/blob/main/data/prepare_dataset.py) for the full implementation. | ||
|
||
Our dataset prepared this way is available [here](https://huggingface.co/datasets/sayakpaul/hf-codegen-v2) and it looks like so: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be cool to have the cards of the datasets mentioned in the blog filled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great content! Mostly some high level feedback to make it a bit easier to follow/read the post:
- I would avoid adding code for completeness, you can always link to a repo. Only show it if there is anything very interesting/useful and you explain it in detail. Inference code is there a few times and I don't think it's necessary for the blog.
- The
Dance of LoRA
section shows an interesting approach but it is very long. I'd consider shorten it a bit and only show the most interesting findings and combinations. E.g. there are a lot of examples and after 2-3 it becomes a bit harder to focus on them. Also consider showing them as code blocks rather than screenshots - it looks a bit nicer in the post. - It's ok to show a few examples where it works and where not but I would probably not take it too far based on examples. We do have benchmarks to check how well models work for chat or code completion and ultimately one should rely on those to guide decisions. Maybe this is a bit out of scope for this project but a note would be great.
Hope this helps!
Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> Co-authored-by: Loubna Ben Allal <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Hello, I've addressed all the comments. I'm planning to release the blog tomorrow (Monday). |
personal_copilot.md
Outdated
|
||
## Full Finetuning | ||
|
||
We will look at how to do full fine-tuning of starcoder-15B on 8 A100 80GB GPUs using PyTorch Fully Sharded Data Parallel (FSDP) technique. For more information on FSDP, please refer [Fine-tuning Llama 2 70B using PyTorch FSDP](https://huggingface.co/blog/ram-efficient-pytorch-fsdp) and [Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel](https://huggingface.co/blog/pytorch-fsdp). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maintain consistency when referring to model checkpoints. Let's maybe follow bigcode/starcoder15B
.
personal_copilot.md
Outdated
|
||
| | | | ||
|---|---| | ||
| Model | Pass@1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Pass@1
denote?
personal_copilot.md
Outdated
4. Dataset: [smangrul/hf-stack-v1](https://huggingface.co/datasets/smangrul/hf-stack-v1) | ||
5. Trained Model: [smangrul/peft-lora-starcoder15B-v2-personal-copilot-A100-40GB-colab](https://huggingface.co/smangrul/peft-lora-starcoder15B-v2-personal-copilot-A100-40GB-colab) | ||
|
||
The command to launch training is given at [run_peft.sh](https://github.com/pacman100/DHS-LLM-Workshop/blob/main/personal_copilot/training/run_peft.sh). The total training time was **12.5 Hours**. Taking the cost of **$1.10 / hr** based on [lambdalabs](https://lambdalabs.com/service/gpu-cloud/pricing), the total cost would be **$13.75**. That's pretty good 🚀! In terms of cost, it's **7.8X** lower than the cost for full fine-tuning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have already talked about the memory requirements with and without QLoRA. So, I guess it's okay to skip that part here? You have already done it but ensuring if we don't want to add a sentence about the memory part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we are comparing the cost of training. I think this is important metrics from the end users point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
I still think the blog reads a bit heavy. I wouldn't mind splitting it up into multiple blogs for easier readability with specific focus areas:
- Creating a personal code assistant
- Deployment and a VS Code extension
- Mixing of LoRAs for Code LLMs
WDYT?
Hello Sayak, I think there is no need to split this into multiple blog posts as the overall signal in the sub-blog posts would not be much. I like the current way the blog is structured in an end-to-end manner. Readers can easily skip the sections and at the same time come back to the same blog to pick it up when interested. |
Co-authored-by: Sayak Paul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, left some nits.
Co-authored-by: Pedro Cuenca <[email protected]>
What does this PR do?