Namespaces
namespace	detail

Classes
class	GPUAwareMPISender
	Derived class of GPUSender that handles MPI made with CUDA aware MPI The copOwnerToAll function uses MPI calls refering to data that resides on the GPU in order to send it directly to other GPUs, skipping the staging step on the CPU. More...

class	GpuBlockPreconditioner
	Is an adaptation of Dune::BlockPreconditioner that works within the CuISTL framework. More...

class	GpuDILU
	DILU preconditioner on the GPU. More...

class	GpuJac
	Jacobi preconditioner on the GPU. More...

class	GPUObliviousMPISender
	Derived class of GPUSender that handles MPI calls that should NOT use GPU direct communicatoin The implementation moves data fromthe GPU to the CPU and then sends it using regular MPI. More...

class	GpuOwnerOverlapCopy
	CUDA compatiable variant of Dune::OwnerOverlapCopyCommunication. More...

class	GPUSender
	GPUSender is a wrapper class for classes which will implement copOwnerToAll This is implemented with the intention of creating communicators with generic GPUSender To hide implementation that will either use GPU aware MPI or not. More...

class	GpuSeqILU0
	Sequential ILU0 preconditioner on the GPU through the CuSparse library. More...

class	GpuSparseMatrix
	The GpuSparseMatrix class simple wrapper class for a CuSparse matrix. More...

class	GpuView
	The GpuView class is provides a view of some data allocated on the GPU Essenstially is only stores a pointer and a size. More...

class	ISTLSolverGPUISTL
	ISTL solver for GPU using the GPU ISTL backend. More...

class	OpmGpuILU0
	ILU0 preconditioner on the GPU. More...

class	PointerView
	A view towards a smart pointer to GPU-allocated memory. More...

class	PreconditionerAdapter
	Makes a CUDA preconditioner available to a CPU simulator. More...

class	PreconditionerConvertFieldTypeAdapter
	Converts the field type (eg. double to float) to benchmark single precision preconditioners. More...

class	PreconditionerCPUMatrixToGPUMatrix
	Convert a CPU matrix to a GPU matrix and use a CUDA preconditioner on the GPU. More...

class	PreconditionerHolder
	Common interface for adapters that hold preconditioners. More...

class	SolverAdapter
	Wraps a CUDA solver to work with CPU data. More...

Enumerations
enum class	MatrixStorageMPScheme { DOUBLE_DIAG_DOUBLE_OFFDIAG = 0 , FLOAT_DIAG_FLOAT_OFFDIAG = 1 , DOUBLE_DIAG_FLOAT_OFFDIAG = 2 }

Functions
MatrixStorageMPScheme	makeMatrixStorageMPScheme (int scheme)

void	printDevice ()

void	setDevice ()

	OPM_CREATE_GPU_RESOURCE (GPUStream, cudaStream_t, cudaStreamCreate, cudaStreamDestroy)
	Manages a CUDA stream resource. More...

	OPM_CREATE_GPU_RESOURCE (GPUEvent, cudaEvent_t, cudaEventCreate, cudaEventDestroy)
	Manages a CUDA event resource. More...

	OPM_CREATE_GPU_RESOURCE (GPUGraph, cudaGraph_t, cudaGraphCreate, cudaGraphDestroy, 0)
	Manages a CUDA graph resource. More...

	OPM_CREATE_GPU_RESOURCE_NO_CREATE (GPUGraphExec, cudaGraphExec_t, cudaGraphExecDestroy)
	Manages a CUDA graph execution resource. More...

template<typename T >
std::shared_ptr< T >	make_gpu_shared_ptr ()
	Creates a shared pointer managing GPU-allocated memory of the specified element type. More...

template<typename T >
std::shared_ptr< T >	make_gpu_shared_ptr (const T &value)
	Creates a shared pointer managing GPU-allocated memory of the specified element type. More...

template<typename T >
auto	make_gpu_unique_ptr ()
	Creates a unique pointer managing GPU-allocated memory of the specified element type. More...

template<typename T >
auto	make_gpu_unique_ptr (const T &value)
	Creates a unique pointer managing GPU-allocated memory of the specified element type. More...

template<class T >
T	copyFromGPU (const T *value)
	Copies a value from GPU-allocated memory to the host. More...

template<class T >
T	copyFromGPU (const std::shared_ptr< T > &value)
	Copies a value from GPU-allocated memory to the host. More...

template<class T , class Deleter >
T	copyFromGPU (const std::unique_ptr< T, Deleter > &value)
	Copies a value from GPU-allocated memory to the host. More...

template<class T >
void	copyToGPU (const T &value, T *ptr)
	Copies a value from the host to GPU-allocated memory. More...

template<class T >
void	copyToGPU (const T &value, const std::shared_ptr< T > &ptr)
	Copies a value from the host to GPU-allocated memory using a shared_ptr. More...

template<class T , class Deleter >
void	copyToGPU (const T &value, const std::unique_ptr< T, Deleter > &ptr)
	Copies a value from the host to GPU-allocated memory using a unique_ptr. More...

template<class T >
PointerView< T >	make_view (const std::shared_ptr< T > &ptr)

template<class T , class Deleter >
PointerView< T >	make_view (const std::unique_ptr< T, Deleter > &ptr)

void	setZeroAtIndexSet (const GpuVector< int > &indexSet)
	The GpuVector class is a simple (arithmetic) vector class for the GPU. More...

std::string	toDebugString ()

void	setDevice (int mpiRank, int numberOfMpiRanks)
	Sets the correct CUDA device in the setting of MPI. More...

void	printDevice (int mpiRank, int numberOfMpiRanks)

Enumeration Type Documentation

◆ MatrixStorageMPScheme

enum class Opm::gpuistl::MatrixStorageMPScheme

strong

Enumerator
DOUBLE_DIAG_DOUBLE_OFFDIAG
FLOAT_DIAG_FLOAT_OFFDIAG
DOUBLE_DIAG_FLOAT_OFFDIAG

Function Documentation

◆ copyFromGPU() [1/3]

template<class T >

T Opm::gpuistl::copyFromGPU ( const std::shared_ptr< T > & value )

Copies a value from GPU-allocated memory to the host.

Parameters

value A shared pointer to the value on the GPU.

Returns: The value copied from the GPU.

Note: This function is involves a sychronization point, and should be used with care.

References copyFromGPU().

◆ copyFromGPU() [2/3]

template<class T , class Deleter >

T Opm::gpuistl::copyFromGPU ( const std::unique_ptr< T, Deleter > & value )

Copies a value from GPU-allocated memory to the host.

Template Parameters

Deleter The custom deleter type.

Parameters

value A unique pointer to the value on the GPU (with a custom deleter).

Returns: The value copied from the GPU.

Note: This function is involves a sychronization point, and should be used with care.

References copyFromGPU().

◆ copyFromGPU() [3/3]

template<class T >

T Opm::gpuistl::copyFromGPU ( const T * value )

Copies a value from GPU-allocated memory to the host.

Parameters

value A pointer to the value on the GPU.

Returns: The value copied from the GPU.

Note: This function is involves a sychronization point, and should be used with care.

References Opm::gpuistl::detail::isGPUPointer(), and OPM_GPU_SAFE_CALL.

Referenced by copyFromGPU().

◆ copyToGPU() [1/3]

template<class T >

void Opm::gpuistl::copyToGPU	(	const T &	value,
		const std::shared_ptr< T > &	ptr
	)

Copies a value from the host to GPU-allocated memory using a shared_ptr.

Parameters

value	The value to copy to the GPU.
ptr	A shared_ptr to the GPU-allocated memory.

Note: This function involves a synchronization point, and should be used with care.

References copyToGPU().

◆ copyToGPU() [2/3]

template<class T , class Deleter >

void Opm::gpuistl::copyToGPU	(	const T &	value,
		const std::unique_ptr< T, Deleter > &	ptr
	)

Copies a value from the host to GPU-allocated memory using a unique_ptr.

Template Parameters

Deleter The custom deleter type.

Parameters

value	The value to copy to the GPU.
ptr	A unique_ptr to the GPU-allocated memory (with a custom deleter).

Note: This function involves a synchronization point, and should be used with care.

References copyToGPU().

◆ copyToGPU() [3/3]

template<class T >

void Opm::gpuistl::copyToGPU	(	const T &	value,
		T *	ptr
	)

Copies a value from the host to GPU-allocated memory.

Parameters

value	The value to copy to the GPU.
ptr	A pointer to the GPU-allocated memory.

Note: This function is involves a sychronization point, and should be used with care.

References Opm::gpuistl::detail::isGPUPointer(), and OPM_GPU_SAFE_CALL.

Referenced by copyToGPU().

◆ make_gpu_shared_ptr() [1/2]

template<typename T >

std::shared_ptr< T > Opm::gpuistl::make_gpu_shared_ptr ( )

Creates a shared pointer managing GPU-allocated memory of the specified element type.

This function allocates memory on the GPU for the type T, using cudaMalloc. It returns a std::shared_ptr that automatically handles the release of GPU memory with cudaFree when no longer in use.

Template Parameters

T	The element type to allocate on the GPU.

Returns: A std::shared_ptr to the GPU-allocated memory.

References OPM_GPU_SAFE_CALL, and OPM_GPU_WARN_IF_ERROR.

◆ make_gpu_shared_ptr() [2/2]

template<typename T >

std::shared_ptr< T > Opm::gpuistl::make_gpu_shared_ptr ( const T & value )

Creates a shared pointer managing GPU-allocated memory of the specified element type.

This function allocates memory on the GPU for the type T, using cudaMalloc. It returns a std::shared_ptr that automatically handles the release of GPU memory with cudaFree when no longer in use.

Template Parameters

T	The element type to allocate on the GPU.

Parameters

value The value to copy to the GPU-allocated memory.

Returns: A std::shared_ptr to the GPU-allocated memory.

References OPM_GPU_SAFE_CALL.

◆ make_gpu_unique_ptr() [1/2]

template<typename T >

auto Opm::gpuistl::make_gpu_unique_ptr ( )

Creates a unique pointer managing GPU-allocated memory of the specified element type.

This function allocates memory on the GPU for the type T, using cudaMalloc . It returns a std::unique_ptr that automatically handles the release of GPU memory with cudaFree when no longer in use.

Template Parameters

T	The element type to allocate on the GPU.

Returns: A std::unique_ptr to the GPU-allocated memory.

References OPM_GPU_SAFE_CALL, and OPM_GPU_WARN_IF_ERROR.

◆ make_gpu_unique_ptr() [2/2]

template<typename T >

auto Opm::gpuistl::make_gpu_unique_ptr ( const T & value )

Creates a unique pointer managing GPU-allocated memory of the specified element type.

This function allocates memory on the GPU for the type T, using cudaMalloc. It returns a std::unique_ptr that automatically handles the release of GPU memory with cudaFree when no longer in use.

Template Parameters

T	The element type to allocate on the GPU.

Parameters

value The value to copy to the GPU-allocated memory.

Returns: A std::unique_ptr to the GPU-allocated memory.

References OPM_GPU_SAFE_CALL.

◆ make_view() [1/2]

template<class T >

PointerView< T > Opm::gpuistl::make_view ( const std::shared_ptr< T > & ptr )

◆ make_view() [2/2]

template<class T , class Deleter >

PointerView< T > Opm::gpuistl::make_view ( const std::unique_ptr< T, Deleter > & ptr )

◆ makeMatrixStorageMPScheme()

MatrixStorageMPScheme Opm::gpuistl::makeMatrixStorageMPScheme ( int scheme )

inline

References Opm::gpuistl::detail::isValidMatrixStorageMPScheme().

◆ OPM_CREATE_GPU_RESOURCE() [1/3]

Opm::gpuistl::OPM_CREATE_GPU_RESOURCE	(	GPUEvent	,
		cudaEvent_t	,
		cudaEventCreate	,
		cudaEventDestroy
	)

Manages a CUDA event resource.

This resource encapsulates a cudaEvent_t handle and provides automatic creation and destruction of the CUDA event. Use this resource to measure elapsed time or synchronize GPU executions between different streams.

◆ OPM_CREATE_GPU_RESOURCE() [2/3]

Opm::gpuistl::OPM_CREATE_GPU_RESOURCE	(	GPUGraph	,
		cudaGraph_t	,
		cudaGraphCreate	,
		cudaGraphDestroy	,
		0
	)

Manages a CUDA graph resource.

This resource encapsulates a cudaGraph_t handle and provides automatic creation and destruction of a CUDA graph. It represents a series of operations captured for efficient replay, execution, or modification.

◆ OPM_CREATE_GPU_RESOURCE() [3/3]

Opm::gpuistl::OPM_CREATE_GPU_RESOURCE	(	GPUStream	,
		cudaStream_t	,
		cudaStreamCreate	,
		cudaStreamDestroy
	)

Manages a CUDA stream resource.

This resource encapsulates a cudaStream_t handle and provides automatic creation and destruction of the CUDA stream. Use this resource to schedule and synchronize GPU kernels or other asynchronous operations.

◆ OPM_CREATE_GPU_RESOURCE_NO_CREATE()

Opm::gpuistl::OPM_CREATE_GPU_RESOURCE_NO_CREATE	(	GPUGraphExec	,
		cudaGraphExec_t	,
		cudaGraphExecDestroy
	)

Manages a CUDA graph execution resource.

This resource encapsulates a cudaGraphExec_t handle and provides automatic destruction of the CUDA graph execution object. It represents the compiled and optimized version of a CUDA graph ready for efficient execution.

◆ printDevice() [1/2]

void Opm::gpuistl::printDevice ( )

Referenced by Opm::Main::initialize_().

◆ printDevice() [2/2]

void Opm::gpuistl::printDevice	(	int	mpiRank,
		int	numberOfMpiRanks
	)

◆ setDevice() [1/2]

void Opm::gpuistl::setDevice ( )

◆ setDevice() [2/2]

void Opm::gpuistl::setDevice	(	int	mpiRank,
		int	numberOfMpiRanks
	)

Sets the correct CUDA device in the setting of MPI.

Note: This assumes that every node has equally many GPUs, all of the same caliber; This probably needs to be called before MPI_Init if one uses GPUDirect transfers (see eg. https://devtalk.nvidia.com/default/topic/752046/teaching-and-curriculum-support/multi-gpu-system-running-mpi-cuda-/ ); If no CUDA device is present, this does nothing.

◆ setZeroAtIndexSet()

void Opm::gpuistl::setZeroAtIndexSet ( const GpuVector< int > & indexSet )

The GpuVector class is a simple (arithmetic) vector class for the GPU.

Note: we currently only support simple raw primitives for T (double, float and int); We currently only support arithmetic operations on double and float.; this vector has no notion of block size. The user is responsible for allocating the correct number of primitives (double or floats)

Example usage:

   #include <opm/simulators/linalg/gpuistl/GpuVector.hpp>
  
   void someFunction() {
       auto someDataOnCPU = std::vector<double>({1.0, 2.0, 42.0, 59.9451743, 10.7132692});
  
       auto dataOnGPU = GpuVector<double>(someDataOnCPU);
  
       // Multiply by 4.0:
       dataOnGPU *= 4.0;
  
       // Get data back on CPU in another vector:
       auto stdVectorOnCPU = dataOnGPU.asStdVector();
   }
  
   @tparam T the type to store. Can be either float, double or int.
  /
template <typename T>
class GpuVector
{
public:
    using field_type = T;
    using size_type = size_t;
 
 
    GpuVector(const GpuVector<T>& other);
 
    explicit GpuVector(const std::vector<T>& data);
 
    GpuVector& operator=(const GpuVector<T>& other);
 
    template<int BlockDimension>
    explicit GpuVector(const Dune::BlockVector<Dune::FieldVector<T, BlockDimension>>& bvector)
        : GpuVector(bvector.dim())
    {
        copyFromHost(bvector);
    }
 
    GpuVector& operator=(T scalar);
 
    explicit GpuVector(const size_t numberOfElements);
 
 
    GpuVector(const T* dataOnHost, const size_t numberOfElements);
 
    virtual ~GpuVector();
 
    T* data();
 
    const T* data() const;
 
    template <int BlockDimension>
    void copyFromHost(const Dune::BlockVector<Dune::FieldVector<T, BlockDimension>>& bvector)
    {
        // TODO: [perf] vector.dim() can be replaced by bvector.N() * BlockDimension
        if (detail::to_size_t(m_numberOfElements) != bvector.dim()) {
            OPM_THROW(std::runtime_error,
                      fmt::format("Given incompatible vector size. GpuVector has size {}, \n"
                                  "however, BlockVector has N() = {}, and dim = {}.",
                                  m_numberOfElements,
                                  bvector.N(),
                                  bvector.dim()));
        }
        const auto dataPointer = static_cast<const T*>(&(bvector[0][0]));
        copyFromHost(dataPointer, m_numberOfElements);
    }
 
    template <int BlockDimension>
    void copyToHost(Dune::BlockVector<Dune::FieldVector<T, BlockDimension>>& bvector) const
    {
        // TODO: [perf] vector.dim() can be replaced by bvector.N() * BlockDimension
        if (detail::to_size_t(m_numberOfElements) != bvector.dim()) {
            OPM_THROW(std::runtime_error,
                      fmt::format("Given incompatible vector size. GpuVector has size {},\n however, the BlockVector "
                                  "has has N() = {}, and dim() = {}.",
                                  m_numberOfElements,
                                  bvector.N(),
                                  bvector.dim()));
        }
        const auto dataPointer = static_cast<T*>(&(bvector[0][0]));
        copyToHost(dataPointer, m_numberOfElements);
    }
 
    void copyFromHost(const T* dataPointer, size_t numberOfElements);
    void copyFromHost(const T* dataPointer, size_t numberOfElements, cudaStream_t stream);
 
    void copyToHost(T* dataPointer, size_t numberOfElements) const;
 
    void copyFromHost(const std::vector<T>& data);
 
    void copyToHost(std::vector<T>& data) const;
 
    void copyFromDeviceToDevice(const GpuVector<T>& other) const;
 
    void prepareSendBuf(GpuVector<T>& buffer, const GpuVector<int>& indexSet) const;
    void syncFromRecvBuf(GpuVector<T>& buffer, const GpuVector<int>& indexSet) const;
 
    GpuVector<T>& operator*=(const T& scalar);
 
    GpuVector<T>& axpy(T alpha, const GpuVector<T>& y);
 
    GpuVector<T>& operator+=(const GpuVector<T>& other);
 
    GpuVector<T>& operator-=(const GpuVector<T>& other);
 
    T dot(const GpuVector<T>& other) const;
 
    T two_norm() const;
 
    T dot(const GpuVector<T>& other, const GpuVector<int>& indexSet, GpuVector<T>& buffer) const;
 
    T two_norm(const GpuVector<int>& indexSet, GpuVector<T>& buffer) const;
 
 
    T dot(const GpuVector<T>& other, const GpuVector<int>& indexSet) const;
 
    T two_norm(const GpuVector<int>& indexSet) const;
 
 
    size_type dim() const;
 
 
    std::vector<T> asStdVector() const;
 
    template <int blockSize>
    Dune::BlockVector<Dune::FieldVector<T, blockSize>> asDuneBlockVector() const
    {
        OPM_ERROR_IF(dim() % blockSize != 0,
                     fmt::format("blockSize is not a multiple of dim(). Given blockSize = {}, and dim() = {}",
                                 blockSize,
                                 dim()));
 
        Dune::BlockVector<Dune::FieldVector<T, blockSize>> returnValue(dim() / blockSize);
        copyToHost(returnValue);
        return returnValue;
    }
 
 

◆ toDebugString()

std::string Opm::gpuistl::toDebugString ( )

References Opm::to_string().

Namespaces

Classes

Enumerations

Functions

Enumeration Type Documentation

◆ MatrixStorageMPScheme

Function Documentation

◆ copyFromGPU() [1/3]

◆ copyFromGPU() [2/3]

◆ copyFromGPU() [3/3]

◆ copyToGPU() [1/3]

◆ copyToGPU() [2/3]

◆ copyToGPU() [3/3]

◆ make_gpu_shared_ptr() [1/2]

◆ make_gpu_shared_ptr() [2/2]

◆ make_gpu_unique_ptr() [1/2]

◆ make_gpu_unique_ptr() [2/2]

◆ make_view() [1/2]

◆ make_view() [2/2]

◆ makeMatrixStorageMPScheme()

◆ OPM_CREATE_GPU_RESOURCE() [1/3]

◆ OPM_CREATE_GPU_RESOURCE() [2/3]

◆ OPM_CREATE_GPU_RESOURCE() [3/3]

◆ OPM_CREATE_GPU_RESOURCE_NO_CREATE()

◆ printDevice() [1/2]

◆ printDevice() [2/2]

◆ setDevice() [1/2]

◆ setDevice() [2/2]

◆ setZeroAtIndexSet()

◆ toDebugString()