Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need some help/clarity on installing #12068

Open
philm001 opened this issue Feb 5, 2025 · 1 comment
Open

Need some help/clarity on installing #12068

philm001 opened this issue Feb 5, 2025 · 1 comment

Comments

@philm001
Copy link

philm001 commented Feb 5, 2025

Hello all,

First I want to apologize because I know that this doesn't really belong here but rather on the forums themselves. But I have posted on the forums and still waiting to hear back. I sent messages to mods but haven't heard from them and there are certain boards that I am locked out of that I would post to.

I need to ask questions about NeMo installation because I want to learn how to use it. But I am finding it difficult getting help from the community. My only option here is to post on the actual issues tab. Again I apologize but I am not sure what else to do.

I am currently learning how to use the NeMo framework. I have things setup for a SLURM cluster. I have Nemo-curator installed and I am currently going through the tutorials. (Shout out to Ryan for his great assistance). While I am going through those tutorials. I want to finish setting up NeMo on my cluster. I hope to begin Nemo tutorials by mid-month. When I start them, I want to run the tutorials under the SLURM cluster.

Now just for some background. I am at the point of running NeMo-Curator. I am able to run it under a single machine and the tutorials seem pretty basic. Do I need the cluster, maybe not. But I would like to learn how to properly set things up because in the not so distant future I will be upgrading my hardware to do my training much better.

As of right now, I do not have NVidia GPUs but that will soon change. So using the cluster is two-fold. Mainly to speed up the process a little bit. And to use my existing resources to learn NeMo before going big. From one of the discussions here, I know it is possible to run NeMo in CPU-only mode. I wouldn't be working with large models just yet. I would start off with Bart (< 1 billion parameters) as my "learning model".

So I essentially have 2 questions about installing everything. I want to make sure that I get the workflow right the first time around.

  1. Now we come to the setup. NeMo says that I need to use it with NeMo-run and I can use this script to run on the SLURM cluster. However, NeMo-Curator says that I should run using the Nvidia Nemo Framework Launcher to run under SLURM. With these two, I am wondering which one is the correct approach? Or do I have to use them separate depending on if I want to use NeMo for training (NeMo-run) or for curating data (Framework Launcher)

  2. Since I am going to be running on the SLURM cluster, I am assuming that each node will need access to the library. There are 2 options here. 1. Install Docker container or 2. Install via pip. For a cluster setup, which one makes the most amount of sense? I don't need latest and greatest, I just need a simple way to get NeMo v2 up and running on the cluster.

Please let me know your feedback and I look forward to your replies.

@philm001
Copy link
Author

philm001 commented Feb 7, 2025

Hello all,

I think that after much consideration, I am going to go with NeMo-run since this supports NeMo 2.0. The framework laucher only support NeMo 1.0 at this time. I would like to switch over to the launcher once it gets updated

I also think that at least for the NeMo-curator aspect, I will be using the Pip installer. Once I get more comfortable with using NeMo, may switch over to the docker container because I do like having everything contained into one space. I need to familiarize myself with working with docker containers and I don't want to stretch myself too thin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant