FindCUDA¶
Deprecated since version 3.10.
It is no longer necessary to use this module or call find_package(CUDA)
for compiling CUDA code. Instead, list CUDA
among the languages named
in the top-level call to the project()
command, or call the
enable_language()
command with CUDA
.
Then one can add CUDA (.cu
) sources to programs directly
in calls to add_library()
and add_executable()
.
New in version 3.17: To find and use the CUDA toolkit libraries the FindCUDAToolkit
module has superseded this module. It works whether or not the CUDA
language is enabled.
Documentation of Deprecated Usage¶
Tools for building CUDA C files: libraries and build dependencies.
This script locates the NVIDIA CUDA C tools. It should work on Linux, Windows, and macOS and should be reasonably up to date with CUDA C releases.
New in version 3.19: QNX support.
This script makes use of the standard find_package()
arguments of
<VERSION>
, REQUIRED
and QUIET
. CUDA_FOUND
will report if an
acceptable version of CUDA was found.
The script will prompt the user to specify CUDA_TOOLKIT_ROOT_DIR
if
the prefix cannot be determined by the location of nvcc in the system
path and REQUIRED
is specified to find_package()
. To use
a different installed version of the toolkit set the environment variable
CUDA_BIN_PATH
before running cmake (e.g.
CUDA_BIN_PATH=/usr/local/cuda1.0
instead of the default
/usr/local/cuda
) or set CUDA_TOOLKIT_ROOT_DIR
after configuring. If
you change the value of CUDA_TOOLKIT_ROOT_DIR
, various components that
depend on the path will be relocated.
It might be necessary to set CUDA_TOOLKIT_ROOT_DIR
manually on certain
platforms, or to use a CUDA runtime not installed in the default
location. In newer versions of the toolkit the CUDA library is
included with the graphics driver -- be sure that the driver version
matches what is needed by the CUDA runtime version.
Input Variables¶
The following variables affect the behavior of the macros in the
script (in alphabetical order). Note that any of these flags can be
changed multiple times in the same directory before calling
cuda_add_executable()
, cuda_add_library()
, cuda_compile()
,
cuda_compile_ptx()
, cuda_compile_fatbin()
, cuda_compile_cubin()
or cuda_wrap_srcs()
:
CUDA_64_BIT_DEVICE_CODE
(Default: host bit size)Set to
ON
to compile for 64 bit device code, OFF for 32 bit device code. Note that making this different from the host code when generating object or C files from CUDA code just won't work, because size_t gets defined by nvcc in the generated source. If you compile to PTX and then load the file yourself, you can mix bit sizes between device and host.CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE
(Default:ON
)Set to
ON
if you want the custom build rule to be attached to the source file in Visual Studio. Turn OFF if you add the same cuda file to multiple targets.This allows the user to build the target from the CUDA file; however, bad things can happen if the CUDA source file is added to multiple targets. When performing parallel builds it is possible for the custom build command to be run more than once and in parallel causing cryptic build errors. VS runs the rules for every source file in the target, and a source can have only one rule no matter how many projects it is added to. When the rule is run from multiple targets race conditions can occur on the generated file. Eventually everything will get built, but if the user is unaware of this behavior, there may be confusion. It would be nice if this script could detect the reuse of source files across multiple targets and turn the option off for the user, but no good solution could be found.
CUDA_BUILD_CUBIN
(Default:OFF
)Set to
ON
to enable and extra compilation pass with the-cubin
option in Device mode. The output is parsed and register, shared memory usage is printed during build.CUDA_BUILD_EMULATION
(Default:OFF
for device mode)Set to
ON
for Emulation mode.-D_DEVICEEMU
is defined for CUDA C files whenCUDA_BUILD_EMULATION
isTRUE
.CUDA_LINK_LIBRARIES_KEYWORD
(Default:""
)New in version 3.9.
The
<PRIVATE|PUBLIC|INTERFACE>
keyword to use for internaltarget_link_libraries()
calls. The default is to use no keyword which uses the old "plain" form oftarget_link_libraries()
. Note that is matters because whatever is used inside theFindCUDA
module must also be used outside - the two forms oftarget_link_libraries()
cannot be mixed.CUDA_GENERATED_OUTPUT_DIR
(Default:CMAKE_CURRENT_BINARY_DIR
)Set to the path you wish to have the generated files placed. If it is blank output files will be placed in
CMAKE_CURRENT_BINARY_DIR
. Intermediate files will always be placed inCMAKE_CURRENT_BINARY_DIR/CMakeFiles
.CUDA_HOST_COMPILATION_CPP
(Default:ON
)Set to
OFF
for C compilation of host code.CUDA_HOST_COMPILER
(Default:CMAKE_C_COMPILER
)Set the host compiler to be used by nvcc. Ignored if
-ccbin
or--compiler-bindir
is already present in theCUDA_NVCC_FLAGS
orCUDA_NVCC_FLAGS_<CONFIG>
variables. For Visual Studio targets, the host compiler is constructed with one or more visual studio macros such as$(VCInstallDir)
, that expands out to the path when the command is run from within VS.New in version 3.13: If the
CUDAHOSTCXX
environment variable is set it will be used as the default.CUDA_NVCC_FLAGS
,CUDA_NVCC_FLAGS_<CONFIG>
Additional NVCC command line arguments. NOTE: multiple arguments must be semi-colon delimited (e.g.
--compiler-options;-Wall
)New in version 3.6: Contents of these variables may use
generator expressions
.CUDA_PROPAGATE_HOST_FLAGS
(Default:ON
)Set to
ON
to propagateCMAKE_{C,CXX}_FLAGS
and their configuration dependent counterparts (e.g.CMAKE_C_FLAGS_DEBUG
) automatically to the host compiler through nvcc's-Xcompiler
flag. This helps make the generated host code match the rest of the system better. Sometimes certain flags give nvcc problems, and this will help you turn the flag propagation off. This does not affect the flags supplied directly to nvcc viaCUDA_NVCC_FLAGS
or through theOPTION
flags specified throughcuda_add_library()
,cuda_add_executable()
, orcuda_wrap_srcs()
. Flags used for shared library compilation are not affected by this flag.CUDA_SEPARABLE_COMPILATION
(Default:OFF
)If set this will enable separable compilation for all CUDA runtime object files. If used outside of
cuda_add_executable()
andcuda_add_library()
(e.g. callingcuda_wrap_srcs()
directly),cuda_compute_separable_compilation_object_file_name()
andcuda_link_separable_compilation_objects()
should be called.CUDA_SOURCE_PROPERTY_FORMAT
New in version 3.3.
If this source file property is set, it can override the format specified to
cuda_wrap_srcs()
(OBJ
,PTX
,CUBIN
, orFATBIN
). If an input source file is not a.cu
file, setting this file will cause it to be treated as a.cu
file. See documentation for set_source_files_properties on how to set this property.CUDA_USE_STATIC_CUDA_RUNTIME
(Default:ON
)New in version 3.3.
When enabled the static version of the CUDA runtime library will be used in
CUDA_LIBRARIES
. If the version of CUDA configured doesn't support this option, then it will be silently disabled.CUDA_VERBOSE_BUILD
(Default:OFF
)Set to
ON
to see all the commands used when building the CUDA file. When using a Makefile generator the value defaults toVERBOSE
(runmake VERBOSE=1
to see output), although settingCUDA_VERBOSE_BUILD
toON
will always print the output.
Commands¶
The script creates the following functions and macros (in alphabetical order):
cuda_add_cufft_to_target(<cuda_target>)
Adds the cufft library to the target (can be any target). Handles whether you are in emulation mode or not.
cuda_add_cublas_to_target(<cuda_target>)
Adds the cublas library to the target (can be any target). Handles whether you are in emulation mode or not.
cuda_add_executable(<cuda_target> <file>...
[WIN32] [MACOSX_BUNDLE] [EXCLUDE_FROM_ALL] [OPTIONS ...])
Creates an executable <cuda_target>
which is made up of the files
specified. All of the non CUDA C files are compiled using the standard
build rules specified by CMake and the CUDA files are compiled to object
files using nvcc and the host compiler. In addition CUDA_INCLUDE_DIRS
is
added automatically to include_directories()
. Some standard CMake target
calls can be used on the target after calling this macro
(e.g. set_target_properties()
and target_link_libraries()
), but setting
properties that adjust compilation flags will not affect code compiled by
nvcc. Such flags should be modified before calling cuda_add_executable()
,
cuda_add_library()
or cuda_wrap_srcs()
.
cuda_add_library(<cuda_target> <file>...
[STATIC | SHARED | MODULE] [EXCLUDE_FROM_ALL] [OPTIONS ...])
Same as cuda_add_executable()
except that a library is created.
cuda_build_clean_target()
Creates a convenience target that deletes all the dependency files generated. You should make clean after running this target to ensure the dependency files get regenerated.
cuda_compile(<generated_files> <file>... [STATIC | SHARED | MODULE]
[OPTIONS ...])
Returns a list of generated files from the input source files to be used
with add_library()
or add_executable()
.
cuda_compile_ptx(<generated_files> <file>... [OPTIONS ...])
Returns a list of PTX
files generated from the input source files.
cuda_compile_fatbin(<generated_files> <file>... [OPTIONS ...])
New in version 3.1.
Returns a list of FATBIN
files generated from the input source files.
cuda_compile_cubin(<generated_files> <file>... [OPTIONS ...])
New in version 3.1.
Returns a list of CUBIN
files generated from the input source files.
cuda_compute_separable_compilation_object_file_name(<output_file_var>
<cuda_target>
<object_files>)
Compute the name of the intermediate link file used for separable
compilation. This file name is typically passed into
CUDA_LINK_SEPARABLE_COMPILATION_OBJECTS
. output_file_var is produced
based on cuda_target the list of objects files that need separable
compilation as specified by <object_files>
. If the <object_files>
list is
empty, then <output_file_var>
will be empty. This function is called
automatically for cuda_add_library()
and cuda_add_executable()
. Note that
this is a function and not a macro.
cuda_include_directories(path0 path1 ...)
Sets the directories that should be passed to nvcc
(e.g. nvcc -Ipath0 -Ipath1 ...
). These paths usually contain other .cu
files.
cuda_link_separable_compilation_objects(<output_file_var> <cuda_target>
<nvcc_flags> <object_files>)
Generates the link object required by separable compilation from the given
object files. This is called automatically for cuda_add_executable()
and
cuda_add_library()
, but can be called manually when using cuda_wrap_srcs()
directly. When called from cuda_add_library()
or cuda_add_executable()
the
<nvcc_flags>
passed in are the same as the flags passed in via the OPTIONS
argument. The only nvcc flag added automatically is the bitness flag as
specified by CUDA_64_BIT_DEVICE_CODE
. Note that this is a function
instead of a macro.
cuda_select_nvcc_arch_flags(<out_variable> [<target_CUDA_architecture> ...])
Selects GPU arch flags for nvcc based on target_CUDA_architecture
.
Values for target_CUDA_architecture
:
Auto
: detects local machine GPU compute arch at runtime.Common
andAll
: cover common and entire subsets of architectures.<name>
: one ofFermi
,Kepler
,Maxwell
,Kepler+Tegra
,Kepler+Tesla
,Maxwell+Tegra
,Pascal
.<ver>
,<ver>(<ver>)
,<ver>+PTX
, where<ver>
is one of2.0
,2.1
,3.0
,3.2
,3.5
,3.7
,5.0
,5.2
,5.3
,6.0
,6.2
.
Returns list of flags to be added to CUDA_NVCC_FLAGS
in <out_variable>
.
Additionally, sets <out_variable>_readable
to the resulting numeric list.
Example:
cuda_select_nvcc_arch_flags(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0) Maxwell)
list(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})
More info on CUDA architectures: https://en.wikipedia.org/wiki/CUDA. Note that this is a function instead of a macro.
cuda_wrap_srcs(<cuda_target> <format> <generated_files> <file>...
[STATIC | SHARED | MODULE] [OPTIONS ...])
This is where all the magic happens. cuda_add_executable()
,
cuda_add_library()
, cuda_compile()
, and cuda_compile_ptx()
all call this
function under the hood.
Given the list of files <file>...
this macro generates
custom commands that generate either PTX or linkable objects (use PTX
or
OBJ
for the <format>
argument to switch). Files that don't end with .cu
or have the HEADER_FILE_ONLY
property are ignored.
The arguments passed in after OPTIONS
are extra command line options to
give to nvcc. You can also specify per configuration options by
specifying the name of the configuration followed by the options. General
options must precede configuration specific options. Not all
configurations need to be specified, only the ones provided will be used.
For example:
cuda_add_executable(...
OPTIONS -DFLAG=2 "-DFLAG_OTHER=space in flag"
DEBUG -g
RELEASE --use_fast_math
RELWITHDEBINFO --use_fast_math;-g
MINSIZEREL --use_fast_math)
For certain configurations (namely VS generating object files with
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE
set to ON
), no generated file will
be produced for the given cuda file. This is because when you add the
cuda file to Visual Studio it knows that this file produces an object file
and will link in the resulting object file automatically.
This script will also generate a separate cmake script that is used at build time to invoke nvcc. This is for several reasons:
nvcc can return negative numbers as return values which confuses Visual Studio into thinking that the command succeeded. The script now checks the error codes and produces errors when there was a problem.
nvcc has been known to not delete incomplete results when it encounters problems. This confuses build systems into thinking the target was generated when in fact an unusable file exists. The script now deletes the output files if there was an error.
By putting all the options that affect the build into a file and then make the build rule dependent on the file, the output files will be regenerated when the options change.
This script also looks at optional arguments STATIC
, SHARED
, or MODULE
to
determine when to target the object compilation for a shared library.
BUILD_SHARED_LIBS
is ignored in cuda_wrap_srcs()
, but it is respected in
cuda_add_library()
. On some systems special flags are added for building
objects intended for shared libraries. A preprocessor macro,
<target_name>_EXPORTS
is defined when a shared library compilation is
detected.
Flags passed into add_definitions with -D
or /D
are passed along to nvcc.
Result Variables¶
The script defines the following variables:
CUDA_VERSION_MAJOR
The major version of cuda as reported by nvcc.
CUDA_VERSION_MINOR
The minor version.
CUDA_VERSION
,CUDA_VERSION_STRING
Full version in the
X.Y
format.CUDA_HAS_FP16
New in version 3.6: Whether a short float (
float16
,fp16
) is supported.CUDA_TOOLKIT_ROOT_DIR
Path to the CUDA Toolkit (defined if not set).
CUDA_SDK_ROOT_DIR
Path to the CUDA SDK. Use this to find files in the SDK. This script will not directly support finding specific libraries or headers, as that isn't supported by NVIDIA. If you want to change libraries when the path changes see the
FindCUDA.cmake
script for an example of how to clear these variables. There are also examples of how to use theCUDA_SDK_ROOT_DIR
to locate headers or libraries, if you so choose (at your own risk).CUDA_INCLUDE_DIRS
Include directory for cuda headers. Added automatically for
cuda_add_executable()
andcuda_add_library()
.CUDA_LIBRARIES
Cuda RT library.
CUDA_CUFFT_LIBRARIES
Device or emulation library for the Cuda FFT implementation (alternative to
cuda_add_cufft_to_target()
macro)CUDA_CUBLAS_LIBRARIES
Device or emulation library for the Cuda BLAS implementation (alternative to
cuda_add_cublas_to_target()
macro).CUDA_cudart_static_LIBRARY
Statically linkable cuda runtime library. Only available for CUDA version 5.5+.
CUDA_cudadevrt_LIBRARY
New in version 3.7: Device runtime library. Required for separable compilation.
CUDA_cupti_LIBRARY
CUDA Profiling Tools Interface library. Only available for CUDA version 4.0+.
CUDA_curand_LIBRARY
CUDA Random Number Generation library. Only available for CUDA version 3.2+.
CUDA_cusolver_LIBRARY
New in version 3.2: CUDA Direct Solver library. Only available for CUDA version 7.0+.
CUDA_cusparse_LIBRARY
CUDA Sparse Matrix library. Only available for CUDA version 3.2+.
CUDA_npp_LIBRARY
NVIDIA Performance Primitives lib. Only available for CUDA version 4.0+.
CUDA_nppc_LIBRARY
NVIDIA Performance Primitives lib (core). Only available for CUDA version 5.5+.
CUDA_nppi_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 5.5 - 8.0.
CUDA_nppial_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppicc_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppicom_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0 - 10.2. Replaced by nvjpeg.
CUDA_nppidei_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppif_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppig_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppim_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppist_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppisu_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_nppitc_LIBRARY
NVIDIA Performance Primitives lib (image processing). Only available for CUDA version 9.0.
CUDA_npps_LIBRARY
NVIDIA Performance Primitives lib (signal processing). Only available for CUDA version 5.5+.
CUDA_nvcuvenc_LIBRARY
CUDA Video Encoder library. Only available for CUDA version 3.2+. Windows only.
CUDA_nvcuvid_LIBRARY
CUDA Video Decoder library. Only available for CUDA version 3.2+. Windows only.
CUDA_nvToolsExt_LIBRARY
New in version 3.16: NVIDA CUDA Tools Extension library. Available for CUDA version 5+.
CUDA_OpenCL_LIBRARY
New in version 3.16: NVIDA CUDA OpenCL library. Available for CUDA version 5+.