From bfaefd0873a91aaffaae4254da5734f2fb311f48 Mon Sep 17 00:00:00 2001 From: Connor Baker Date: Tue, 7 Nov 2023 14:35:37 +0000 Subject: [PATCH] cudaPackages: add docs --- doc/languages-frameworks/cuda.section.md | 47 +++++++++++++++---- pkgs/development/cuda-modules/README.md | 32 +++++++++++++ .../cuda-modules/modules/README.md | 27 +++++++++++ 3 files changed, 97 insertions(+), 9 deletions(-) create mode 100644 pkgs/development/cuda-modules/README.md create mode 100644 pkgs/development/cuda-modules/modules/README.md diff --git a/doc/languages-frameworks/cuda.section.md b/doc/languages-frameworks/cuda.section.md index 01a4f20da982..11c86e375c61 100644 --- a/doc/languages-frameworks/cuda.section.md +++ b/doc/languages-frameworks/cuda.section.md @@ -68,16 +68,45 @@ All new projects should use the CUDA redistributables available in [`cudaPackage ### Updating CUDA redistributables {#updating-cuda-redistributables} 1. Go to NVIDIA's index of CUDA redistributables: -2. Copy the `redistrib_*.json` corresponding to the release to `pkgs/development/compilers/cudatoolkit/redist/manifests`. -3. Generate the `redistrib_features_*.json` file by running: +2. Make a note of the new version of CUDA available. +3. Run - ```bash - nix run github:ConnorBaker/cuda-redist-find-features -- - ``` + ```bash + nix run github:connorbaker/cuda-redist-find-features -- \ + download-manifests \ + --log-level DEBUG \ + --version \ + https://developer.download.nvidia.com/compute/cuda/redist \ + ./pkgs/development/cuda-modules/cuda/manifests + ``` - That command will generate the `redistrib_features_*.json` file in the same directory as the manifest. + This will download a copy of the manifest for the new version of CUDA. +4. Run -4. Include the path to the new manifest in `pkgs/development/compilers/cudatoolkit/redist/extension.nix`. + ```bash + nix run github:connorbaker/cuda-redist-find-features -- \ + process-manifests \ + --log-level DEBUG \ + --version \ + https://developer.download.nvidia.com/compute/cuda/redist \ + ./pkgs/development/cuda-modules/cuda/manifests + ``` + + This will generate a `redistrib_features_.json` file in the same directory as the manifest. +5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`. + +### Updating cuTensor {#updating-cutensor} + +1. Repeat the steps present in [Updating CUDA redistributables](#updating-cuda-redistributables) with the following changes: + - Use the index of cuTensor redistributables: + - Use the newest version of cuTensor available instead of the newest version of CUDA. + - Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`. + - Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`. + +### Updating supported compilers and GPUs {#updating-supported-compilers-and-gpus} + +1. Update `nvcc-compatibilities.nix` in `pkgs/development/cuda-modules/` to include the newest release of NVCC, as well as any newly supported host compilers. +2. Update `gpus.nix` in `pkgs/development/cuda-modules/` to include any new GPUs supported by the new release of CUDA. ### Updating the CUDA Toolkit runfile installer {#updating-the-cuda-toolkit} @@ -99,7 +128,7 @@ All new projects should use the CUDA redistributables available in [`cudaPackage nix store prefetch-file --hash-type sha256 ``` -4. Update `pkgs/development/compilers/cudatoolkit/versions.toml` to include the release. +4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release. ### Updating the CUDA package set {#updating-the-cuda-package-set} @@ -107,7 +136,7 @@ All new projects should use the CUDA redistributables available in [`cudaPackage - NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing. -2. Successfully build the closure of the new package set, updating `pkgs/development/compilers/cudatoolkit/redist/overrides.nix` as needed. Below are some common failures: +2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures: | Unable to ... | During ... | Reason | Solution | Note | | --- | --- | --- | --- | --- | diff --git a/pkgs/development/cuda-modules/README.md b/pkgs/development/cuda-modules/README.md new file mode 100644 index 000000000000..f4844c46a2c2 --- /dev/null +++ b/pkgs/development/cuda-modules/README.md @@ -0,0 +1,32 @@ +# cuda-modules + +> [!NOTE] +> This document is meant to help CUDA maintainers understand the structure of the CUDA packages in Nixpkgs. It is not meant to be a user-facing document. +> For a user-facing document, see [the CUDA section of the manual](../../../doc/languages-frameworks/cuda.section.md). + +The files in this directory are added (in some way) to the `cudaPackages` package set by [cuda-packages.nix](../../top-level/cuda-packages.nix). + +## Top-level files + +Top-level nix files are included in the initial creation of the `cudaPackages` scope. These are typically required for the creation of the finalized `cudaPackages` scope: + +- `backend-stdenv.nix`: Standard environment for CUDA packages. +- `flags.nix`: Flags set, or consumed by, NVCC in order to build packages. +- `gpus.nix`: A list of supported NVIDIA GPUs. +- `nvcc-compatibilities.nix`: NVCC releases and the version range of GCC/Clang they support. + +## Top-level directories + +- `cuda`: CUDA redistributables! Provides extension to `cudaPackages` scope. +- `cudatoolkit`: monolothic CUDA Toolkit run-file installer. Provides extension to `cudaPackages` scope. +- `cudnn`: NVIDIA cuDNN library. +- `cutensor`: NVIDIA cuTENSOR library. +- `generic-builders`: + - Contains a builder `manifest.nix` which operates on the `Manifest` type defined in `modules/generic/manifests`. Most packages are built using this builder. + - Contains a builder `multiplex.nix` which leverages the Manifest builder. In short, the Multiplex builder adds multiple versions of a single package to single instance of the CUDA Packages package set. It is used primarily for packages like `cudnn` and `cutensor`. +- `modules`: Nixpkgs modules to check the shape and content of CUDA redistributable and feature manifests. These modules additionally use shims provided by some CUDA packages to allow them to re-use the `genericManifestBuilder`, even if they don't have manifest files of their own. `cudnn` and `tensorrt` are examples of packages which provide such shims. These modules are further described in the [Modules](./modules/README.md) documentation. +- `nccl`: NVIDIA NCCL library. +- `nccl-tests`: NVIDIA NCCL tests. +- `saxpy`: Example CMake project that uses CUDA. +- `setup-hooks`: Nixpkgs setup hooks for CUDA. +- `tensorrt`: NVIDIA TensorRT library. diff --git a/pkgs/development/cuda-modules/modules/README.md b/pkgs/development/cuda-modules/modules/README.md new file mode 100644 index 000000000000..31aa343bd9d5 --- /dev/null +++ b/pkgs/development/cuda-modules/modules/README.md @@ -0,0 +1,27 @@ +# Modules + +Modules as they are used in `modules` exist primarily to check the shape and content of CUDA redistributable and feature manifests. They are ultimately meant to reduce the repetitive nature of repackaging CUDA redistributables. + +Building most redistributables follows a pattern of a manifest indicating which packages are available at a location, their versions, and their hashes. To avoid creating builders for each and every derivation, modules serve as a way for us to use a single `genericManifestBuilder` to build all redistributables. + +## `generic` + +The modules in `generic` are reusable components meant to check the shape and content of NVIDIA's CUDA redistributable manifests, our feature manifests (which are derived from NVIDIA's manifests), or hand-crafted Nix expressions describing available packages. They are used by the `genericManifestBuilder` to build CUDA redistributables. + +Generally, each package which relies on manifests or Nix release expressions will create an alias to the relevant generic module. For example, the [module for CUDNN](./cudnn/default.nix) aliases the generic module for release expressions, while the [module for CUDA redistributables](./cuda/default.nix) aliases the generic module for manifests. + +Alternatively, additional fields or values may need to be configured to account for the particulars of a package. For example, while the release expressions for [CUDNN](./cudnn/releases.nix) and [TensorRT](./tensorrt/releases.nix) are very close, they differ slightly in the fields they have. The [module for CUDNN](./modules/cudnn/default.nix) is able to use the generic module for release expressions, while the [module for TensorRT](./modules/tensorrt/default.nix) must add additional fields to the generic module. + +### `manifests` + +The modules in `generic/manifests` define the structure of NVIDIA's CUDA redistributable manifests and our feature manifests. + +NVIDIA's redistributable manifests are retrieved from their web server, while the feature manifests are produced by [`cuda-redist-find-features`](https://github.com/connorbaker/cuda-redist-find-features). + +### `releases` + +The modules in `generic/releases` define the structure of our hand-crafted Nix expressions containing information necessary to download and repackage CUDA redistributables. These expressions are created when NVIDIA-provided manifests are unavailable or otherwise unusable. For example, though CUDNN has manifests, a bug in NVIDIA's CI/CD causes manifests for different versions of CUDA to use the same name, which leads to the manifests overwriting each other. + +### `types` + +The modules in `generic/types` define reusable types used in both `generic/manifests` and `generic/releases`.