Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get nvml.h from Anaconda package in Makefile #30

Merged
merged 13 commits into from
Dec 1, 2021

Conversation

XuehaiPan
Copy link
Contributor

Copy the nvml.h from the Anaconda package anaconda.org/nvidia/cuda-nvml-dev (latest version with the main label).

Resolves #29.

cc @elezar

@klueska
Copy link
Contributor

klueska commented Nov 25, 2021

I'm fine with some of the changes in this PR, but as a general policy I don't want to be relying on a third-party repo to pull nvml.h from. I realize NVIDIA's own repos are behind, but that is where we should improve, not moving to a third party one.

@elezar
Copy link
Member

elezar commented Nov 25, 2021

Thanks for the PR @XuehaiPan.

@klueska I also initially thought it was a thirdparty repo, but I believe this is maintained by NVIDIA and is probably available now that the CUDA Python bindings are moving to GA. You do make a good point in that we would have to check the update cadence of this repo though.

@klueska
Copy link
Contributor

klueska commented Nov 25, 2021

Ah, OK, if it is maintained by NVIDIA, then I am OK with it.
And if it's tied to the python bindings releases -- even better.
The NVML team updates these in lockstep with updates to the C library.

@elezar
Copy link
Member

elezar commented Nov 25, 2021

I have pinged the team internally to confirm that this is released regularly. Will update the PR here once I get feedback.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think this looks good. I do think that targeting a specific version is useful, so I don't want to lose that if possible.

@klueska
Copy link
Contributor

klueska commented Nov 25, 2021

I see. This was ran through a linter. If we can standardize on the linter settings and are happy with the output it produces, I am OK with the changes as is.

@XuehaiPan
Copy link
Contributor Author

I see. This was ran through a linter. If we can standardize on the linter settings and are happy with the output it produces, I am OK with the changes as is.

I ran markdownlint with the default settings (integrated with VS Code). No other issues are produced now (I think Bare URLs are acceptable).

Screenshot markdownlint

@XuehaiPan XuehaiPan force-pushed the get-nvml-h branch 2 times, most recently from abbd9ff to 5184837 Compare November 26, 2021 10:31
@klueska
Copy link
Contributor

klueska commented Nov 26, 2021

Can you also add a make target to run markdownlint with the settings you used for the README. This would help us to reduce churn in the README over time.

@XuehaiPan XuehaiPan force-pushed the get-nvml-h branch 2 times, most recently from c10da36 to fe7eac9 Compare November 26, 2021 13:38
@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Nov 26, 2021

Can you also add a make target to run markdownlint with the settings you used for the README.

@klueska I add a new target in Makefile. The docker image for markdown/markdown at dockerhub is too old, so I do not using dockered version in Makefile. User who wants to run the target need to install markdown-cli in their system (npm install markdownlint or brew install markdownlint-cli).

Update: use dockered markdownlint.

@XuehaiPan XuehaiPan force-pushed the get-nvml-h branch 2 times, most recently from ad83566 to ffe14d3 Compare November 27, 2021 05:44
@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Nov 27, 2021

Changes in summary:

  1. Update nvml.h from anaconda package nvidia/cuda-nvml-dev.

  2. Remove option CUDA_VERSION while running make update-nvml-h. List all available packages and let the user pick one:

    $ make update-nvml-h
    Found 5 NVML packages:
    
    No.  Version   Upload Time          Package
      1  11.5.50   2021-11-23-22:46:02  nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2
      2  11.4.120  2021-11-03-22:08:33  nvidia/cuda-nvml-dev/11.4.120/linux-64/cuda-nvml-dev-11.4.120-hb8c74d6_0.tar.bz2
      3  11.4.43   2021-09-08-00:10:30  nvidia/cuda-nvml-dev/11.4.43/linux-64/cuda-nvml-dev-11.4.43-he36855d_0.tar.bz2
      4  11.3.58   2021-09-08-00:36:34  nvidia/cuda-nvml-dev/11.3.58/linux-64/cuda-nvml-dev-11.3.58-hc25e488_0.tar.bz2
      5  11.3.58   2021-09-08-00:36:31  nvidia/cuda-nvml-dev/11.3.58/linux-64/cuda-nvml-dev-11.3.58-h70090ce_0.tar.bz2
    
    Pick an NVML package to update ([1]-5): 1
    
    NVML version: 11.5.50
    Package: nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2
    
    Updating nvml.h to 11.5.50 from https://api.anaconda.org/download/nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2 ...
    Successfully updated nvml.h to 11.5.50.

    NOTE: the table is sorted by version and upload time, the first one is always the latest.

  3. Insert NVML version at the top of file nvml.h.

  4. Update README.md style and add target markdownlint in Makefile.


New command dependencies to run the Makefile:

  1. wget
  2. tar (probably built-in on *nix systems)
  3. sort / cut / tr (probably built-in on *nix systems)
  4. jq (dockered)
  5. markdownlint (dockered)

@XuehaiPan XuehaiPan force-pushed the get-nvml-h branch 2 times, most recently from 41e4f59 to 0e47b3f Compare November 27, 2021 16:12
Signed-off-by: Xuehai Pan <[email protected]>
@elezar elezar self-requested a review November 29, 2021 09:38
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @XuehaiPan. This is looking good now. I'm still waiting on confirmation about those packages, but I don't think that this should be a blocker if @klueska is also onboard with the changes.

@klueska
Copy link
Contributor

klueska commented Nov 29, 2021

I'm getting the following error:

$ make update-nvml-h
Found 5 NVML packages:

No.  Version   Upload Time          Package
  1  11.5.50   2021-11-23-22:46:02  nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2
  2  11.4.120  2021-11-03-22:08:33  nvidia/cuda-nvml-dev/11.4.120/linux-64/cuda-nvml-dev-11.4.120-hb8c74d6_0.tar.bz2
  3  11.4.43   2021-09-08-00:10:30  nvidia/cuda-nvml-dev/11.4.43/linux-64/cuda-nvml-dev-11.4.43-he36855d_0.tar.bz2
  4  11.3.58   2021-09-08-00:36:34  nvidia/cuda-nvml-dev/11.3.58/linux-64/cuda-nvml-dev-11.3.58-hc25e488_0.tar.bz2
  5  11.3.58   2021-09-08-00:36:31  nvidia/cuda-nvml-dev/11.3.58/linux-64/cuda-nvml-dev-11.3.58-h70090ce_0.tar.bz2

Pick an NVML package to update ([1]-5): 1

NVML version: 11.5.50
Package: nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2

Updating nvml.h to 11.5.50 from https://api.anaconda.org/download/nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2 ...
tar: Option -a is not permitted in mode -x
make: *** [update-nvml-h] Error 1

@klueska
Copy link
Contributor

klueska commented Nov 29, 2021

Can we hide the output of the command here:

$ make markdownlint
docker run \
		--rm \
		-v /Users/kklues/projects/go-workspace/src/github.com/nvidia/go-nvml:/Users/kklues/projects/go-workspace/src/github.com/nvidia/go-nvml \
		-w /Users/kklues/projects/go-workspace/src/github.com/nvidia/go-nvml \
		markdownlint/markdownlint:latest \
		--rules=~no-hard-tabs,~line-length \
		README.md

@XuehaiPan
Copy link
Contributor Author

I'm getting the following error:

$ make update-nvml-h
Found 5 NVML packages:

No.  Version   Upload Time          Package
  1  11.5.50   2021-11-23-22:46:02  nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2
  2  11.4.120  2021-11-03-22:08:33  nvidia/cuda-nvml-dev/11.4.120/linux-64/cuda-nvml-dev-11.4.120-hb8c74d6_0.tar.bz2
  3  11.4.43   2021-09-08-00:10:30  nvidia/cuda-nvml-dev/11.4.43/linux-64/cuda-nvml-dev-11.4.43-he36855d_0.tar.bz2
  4  11.3.58   2021-09-08-00:36:34  nvidia/cuda-nvml-dev/11.3.58/linux-64/cuda-nvml-dev-11.3.58-hc25e488_0.tar.bz2
  5  11.3.58   2021-09-08-00:36:31  nvidia/cuda-nvml-dev/11.3.58/linux-64/cuda-nvml-dev-11.3.58-h70090ce_0.tar.bz2

Pick an NVML package to update ([1]-5): 1

NVML version: 11.5.50
Package: nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2

Updating nvml.h to 11.5.50 from https://api.anaconda.org/download/nvidia/cuda-nvml-dev/11.5.50/linux-64/cuda-nvml-dev-11.5.50-h511b398_0.tar.bz2 ...
tar: Option -a is not permitted in mode -x
make: *** [update-nvml-h] Error 1

Seems tar on macOS does not support auto-compress in extract mode:

     -a, --auto-compress
             (c mode only) Use the archive suffix to decide a set of the format and the compressions.  As a simple example,
                   tar -a -cf archive.tgz source.c source.h
             creates a new archive with restricted pax format and gzip compression,
                   tar -a -cf archive.tar.bz2.uu source.c source.h
             creates a new archive with restricted pax format and bzip2 compression and uuencode compression,
                   tar -a -cf archive.zip source.c source.h
             creates a new archive with zip format,
                   tar -a -jcf archive.tgz source.c source.h
             ignores the “-j” option, and creates a new archive with restricted pax format and gzip compression,
                   tar -a -jcf archive.xxx source.c source.h
             if it is unknown suffix or no suffix, creates a new archive with restricted pax format and bzip2 compression.

I'll assume the packages are always in .tar.bz2 format (replace extract flag xaf to xjf).

Signed-off-by: Xuehai Pan <[email protected]>
@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Nov 29, 2021

@klueska The command tar and sed have different command-line options on macOS and Linux. I use dockered sed if running the Makefile on macOS. I think the code will work as expected both on macOS and Linux.

@klueska
Copy link
Contributor

klueska commented Dec 1, 2021

We may want to move to somewhere other than the anaconda repos in the future, but this flow is a definite improvement over the existing one, so let's get it merged.

@XuehaiPan thanks for working on this!

@klueska klueska merged commit 3d70828 into NVIDIA:master Dec 1, 2021
@XuehaiPan XuehaiPan deleted the get-nvml-h branch December 1, 2021 11:52
@elezar
Copy link
Member

elezar commented Dec 1, 2021

Just an update. I can confirm that these are official repositories which are updated as part of the CUDA release process. See also https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#conda

@elezar
Copy link
Member

elezar commented Dec 1, 2021

Thanks again for the PR @XuehaiPan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update nvml.h from anaconda.org/nvidia rather than CUDA docker images
3 participants