Skip to content

amirgholami/inplace

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inplace

CUDA and OpenMP implementations of the C2R and R2C inplace transposition algorithms. These algorithms are described in our PPoPP paper.

We have included a specialization for very tall, skinny matrices that yields good performance for in-place conversions between Arrays of Structures and Structures of Arrays.

The code includes OpenMP and CUDA implementations. The OpenMP implementation is declared in <inplace/openmp.h>, while the CUDA implementation is declared in <inplace/transpose.h>, and carries the following signatures:

namespace inplace {

void transpose(bool row_major, float* data, int m, int n);
void transpose(bool row_major, double* data, int m, int n);

}

About

CUDA and OpenMP implementations of C2R/R2C inplace transposition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.2%
  • Cuda 27.1%
  • C++ 19.3%
  • Makefile 0.4%