You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First I want to apologize because I know that this doesn't really belong here but rather on the forums themselves. But I have posted on the forums and still waiting to hear back. I sent messages to mods but haven't heard from them and there are certain boards that I am locked out of that I would post to.
I need to ask questions about NeMo installation because I want to learn how to use it. But I am finding it difficult getting help from the community. My only option here is to post on the actual issues tab. Again I apologize but I am not sure what else to do.
I am currently learning how to use the NeMo framework. I have things setup for a SLURM cluster. I have Nemo-curator installed and I am currently going through the tutorials. (Shout out to Ryan for his great assistance). While I am going through those tutorials. I want to finish setting up NeMo on my cluster. I hope to begin Nemo tutorials by mid-month. When I start them, I want to run the tutorials under the SLURM cluster.
Now just for some background. I am at the point of running NeMo-Curator. I am able to run it under a single machine and the tutorials seem pretty basic. Do I need the cluster, maybe not. But I would like to learn how to properly set things up because in the not so distant future I will be upgrading my hardware to do my training much better.
As of right now, I do not have NVidia GPUs but that will soon change. So using the cluster is two-fold. Mainly to speed up the process a little bit. And to use my existing resources to learn NeMo before going big. From one of the discussions here, I know it is possible to run NeMo in CPU-only mode. I wouldn't be working with large models just yet. I would start off with Bart (< 1 billion parameters) as my "learning model".
So I essentially have 2 questions about installing everything. I want to make sure that I get the workflow right the first time around.
Now we come to the setup. NeMo says that I need to use it with NeMo-run and I can use this script to run on the SLURM cluster. However, NeMo-Curator says that I should run using the Nvidia Nemo Framework Launcher to run under SLURM. With these two, I am wondering which one is the correct approach? Or do I have to use them separate depending on if I want to use NeMo for training (NeMo-run) or for curating data (Framework Launcher)
Since I am going to be running on the SLURM cluster, I am assuming that each node will need access to the library. There are 2 options here. 1. Install Docker container or 2. Install via pip. For a cluster setup, which one makes the most amount of sense? I don't need latest and greatest, I just need a simple way to get NeMo v2 up and running on the cluster.
Please let me know your feedback and I look forward to your replies.
The text was updated successfully, but these errors were encountered:
I think that after much consideration, I am going to go with NeMo-run since this supports NeMo 2.0. The framework laucher only support NeMo 1.0 at this time. I would like to switch over to the launcher once it gets updated
I also think that at least for the NeMo-curator aspect, I will be using the Pip installer. Once I get more comfortable with using NeMo, may switch over to the docker container because I do like having everything contained into one space. I need to familiarize myself with working with docker containers and I don't want to stretch myself too thin.
Hello all,
First I want to apologize because I know that this doesn't really belong here but rather on the forums themselves. But I have posted on the forums and still waiting to hear back. I sent messages to mods but haven't heard from them and there are certain boards that I am locked out of that I would post to.
I need to ask questions about NeMo installation because I want to learn how to use it. But I am finding it difficult getting help from the community. My only option here is to post on the actual issues tab. Again I apologize but I am not sure what else to do.
I am currently learning how to use the NeMo framework. I have things setup for a SLURM cluster. I have Nemo-curator installed and I am currently going through the tutorials. (Shout out to Ryan for his great assistance). While I am going through those tutorials. I want to finish setting up NeMo on my cluster. I hope to begin Nemo tutorials by mid-month. When I start them, I want to run the tutorials under the SLURM cluster.
Now just for some background. I am at the point of running NeMo-Curator. I am able to run it under a single machine and the tutorials seem pretty basic. Do I need the cluster, maybe not. But I would like to learn how to properly set things up because in the not so distant future I will be upgrading my hardware to do my training much better.
As of right now, I do not have NVidia GPUs but that will soon change. So using the cluster is two-fold. Mainly to speed up the process a little bit. And to use my existing resources to learn NeMo before going big. From one of the discussions here, I know it is possible to run NeMo in CPU-only mode. I wouldn't be working with large models just yet. I would start off with Bart (< 1 billion parameters) as my "learning model".
So I essentially have 2 questions about installing everything. I want to make sure that I get the workflow right the first time around.
Now we come to the setup. NeMo says that I need to use it with NeMo-run and I can use this script to run on the SLURM cluster. However, NeMo-Curator says that I should run using the Nvidia Nemo Framework Launcher to run under SLURM. With these two, I am wondering which one is the correct approach? Or do I have to use them separate depending on if I want to use NeMo for training (NeMo-run) or for curating data (Framework Launcher)
Since I am going to be running on the SLURM cluster, I am assuming that each node will need access to the library. There are 2 options here. 1. Install Docker container or 2. Install via pip. For a cluster setup, which one makes the most amount of sense? I don't need latest and greatest, I just need a simple way to get NeMo v2 up and running on the cluster.
Please let me know your feedback and I look forward to your replies.
The text was updated successfully, but these errors were encountered: