SMDV

Description

SMDV is a collection of modules to test the speed of execution of multiplication for different matrices formats.
Multiplications are performed on the GPU using technology NVIDIA CUDA.

Multiplication algortihms ERTILP and SERTILP were originally created for KMLib - SVM with CUDA

Supported formats

Implemented multiplication formats (in CUDA C):

ELLPACK
SLICED ELLPACK
ERTILP
SERTILP
CSR

SMDV also allows use of the multiplication module Numpy. Such calculations will be performed only on the CPU. You can use them as a reference point for measurements performed on the GPU.

Authors

Krzysztof Sopyła "ksirg" krzysztofsopyla@gmail.com
Paweł Drozda "pdrozda"
Sławomir Figiel "fivitti"

If you use this software in academic publication please cite: (TODO)

License

SMDV is licensed under the MIT License.

Functions

The main purpose of the project is to measure the execution time of multiplications on the GPU for different matrix formats. You can repeat our research on your own computer (you only need to have a compatible Nvidia GPU with CUDA).
You can also carry out their own experiments and their own data independently chosen parameters.
SMDV allows easy collaboration with external scripts.

SMDV further includes tools to help take the measurements for different data:

Convert the matrix to the supported formats
- You can use this to see how the matrices are processed before performing the multiplications
- Handled by the module matrixformat.py.
- Command line interfaces in module cli-conv.py.
Return result multiplication
- Methods for multiplying addition to the time the calculation return also the result of these calculations. You can use it to check its accuracy, either as an end in itself. It's a great, quick way to obtain a result of matrix by vector multiplication.
- Handled by the module matrixmultiplication.py.
- Command line interfaces in module cli-multiply.py.
Get CUDA kernels for multiplication
- You can view the source files to see exactly how the multiplication is performed. You can also use the kernels implemented in their programs. In order to further optimize thecodes are written in CUDA C using metaprogramming.
- Files in CUDA C are located in the kernels.
- Module for metaprogramming and code compilation is handled by cudacodes.py.
Get info about matrix and vector files
- You can quickly get information about the many of matrices stored in the format .mtx and Numpy vectors in .npy and export this information to a CSV file.
- Command line interfaces in module cli-info.py.
Simple and fast generating matrix and vectors
- You can generate random matrices and vectors for testing on selected sizes and densities.
- Handled by the module matrixutilites.py.
- Command line interfaces in module cli-gen.py.
And more...

Dependencies

SMDV uses the packages:

PyCUDA
Numpy
SciPy
Click (for CLI)
and standard library, of course

Results

The following are examples of the results of test using the 10 largest matrices from matrix choice of Francisco Vazquez.

The calculations were performed on a computer:

CPU: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
GPU Card: GeForce GTX 780 Ti
CUDA Capability Major: 3.5
CUDA Driver version: 6.0

Version software:

OS: Ubuntu 13.10
python 2.7.5+
PyCuda 2013.1.1
Numpy 1.7.1

Arguments:

Block size: 128
Threads per row: 2
Slice size: 64
Prefetch: 2
Align: 32
Repeats: 10
Compensation: 10 - So much has been made multiplications "starting". Their results were not included.

Test vectors are generated at random in the range of -1 to 1. Their density is 15%. You can download them from the folder vectors.

Informations about matrices:

Matrix	Rows	Columns	NNZ	Sparsity
cant	62451	62451	4007383	0.103%
consph	83334	83334	6010480	0.087%
cop20k_A	121192	121192	2624331	0.018%
mac_econ_fwd500	206500	206500	1273389	0.003%
mc2depi	525825	525825	2100225	0.001%
pwtk	217918	217918	11634424	0.024%
qcd5_4	49152	49152	1916928	0.079%
rma10	46835	46835	2374001	0.108%
shipsec1	140874	140874	7813404	0.039%
wbp256	65537	65537	31413934	0.731%

Results:

Matrix	Format	Avr time [ms]	Std [ms]
cant	CPU	7.658	0.021
	CSR	0.444	0.002
	ELLPACK	0.348	0.001
	SLICED	0.290	0.003
	SERTILP	0.275	0.002
	ERTILP	0.320	0.001
consph	CPU	11.493	0.014
	CSR	0.521	0.001
	ELLPACK	0.419	0.002
	SLICED	0.373	0.001
	SERTILP	0.356	0.001
	ERTILP	0.423	0.001
cop20k_A	CPU	6.627	0.010
	CSR	0.710	0.002
	ELLPACK	0.545	0.001
	SLICED	0.537	0.002
	SERTILP	0.523	0.012
	ERTILP	0.528	0.004
mac_econ_fwd500	CPU	2.734	0.008
	CSR	1.025	0.025
	ELLPACK	0.696	0.001
	SLICED	0.674	0.002
	SERTILP	0.629	0.002
	ERTILP	0.648	0.002
mc2depi	CPU	4.556	0.160
	CSR	1.868	0.003
	ELLPACK	0.818	0.002
	SLICED	0.844	0.001
	SERTILP	0.836	0.003
	ERTILP	0.827	0.001
pwtk	CPU	22.317	0.009
	CSR	1.268	0.001
	ELLPACK	1.073	0.003
	SLICED	0.904	0.002
	SERTILP	0.872	0.001
	ERTILP	1.006	0.002
qcd5_4	CPU	3.224	0.020
	CSR	0.295	0.001
	ELLPACK	0.190	0.001
	SLICED	0.190	0.002
	SERTILP	0.185	0.000
	ERTILP	0.183	0.001
rma10	CPU	3.925	0.007
	CSR	0.330	0.001
	ELLPACK	0.283	0.001
	SLICED	0.245	0.004
	SERTILP	0.226	0.001
	ERTILP	0.258	0.002
shipsec1	CPU	15.051	0.047
	CSR	0.932	0.003
	ELLPACK	0.787	0.005
	SLICED	0.712	0.001
	SERTILP	0.681	0.000
	ERTILP	0.773	0.003
wbp256	CPU	81.732	0.026
	CSR	1.873	0.005
	ELLPACK	1.608	0.002
	SLICED	1.153	0.003
	SERTILP	1.085	0.002
	ERTILP	1.515	0.007

How to run

How to make SMDV to action and self test the speed of matrix multiplication?
There are two ways. You can write a script in Python, or use the attached CLI.

We assume that you have a test matrices and vectors of length equal to the number of columns of the matrix. If not, you can visit MatrixMarket . We also recommend matrix choice of Francisco Vazquez . You can also use tools provided to generate data described in next section or use package Numpy or Scipy.

1. Multiplication in script

We will use module matrixmultiplication.py. It is the core of the whole process. This module include moduls cudacode.py and matrixformat.py. You do not have to worry about it. You only need to use core-module.

1.1 Full script

The complete script multiplication script is in examples/multiplication.py will work according to the scheme:

Read data
Determination of parameters
Matrix multiplication
Results

1.2 ELLPACK multiplication

Call multiply this format is very simple. As any of the methods has in arguments matrix, vector and the number of repetitions of this operation. In addition, a takes one special parameter - block size.

    m = matrixmultiplication.multiply_ellpack(matrix,
                                              vector,
                                              block_size=128,
                                              repeat=10)
    print 'List of execution times: {0}'.format(m[1])

1.3 SLICED multiplication

This method has three specials parameters: threads per row, slice size, align - constant to calculation and set align of the matrix.

    m = matrixmultiplication.multiply_sliced(matrix,
                                             vector,
                                             align=64,
                                             slice_size=32,
                                             threads_per_row=2,
                                             repeat=10)
    print 'List of execution times: {0}'.format(m[1])

1.4 SERTILP multiplication

This method has specials parameters: threads per row, slice size, align, prefetch - number of requests for access to data notified in advance.

    m = matrixmultiplication.multiply_sertilp(matrix,
                                              threads_per_row=2,
                                              slice_size=32,
                                              prefetch=2,
                                              align=64,
                                              repeat=10)
    print 'List of execution times: {0}'.format(m[1])

1.5 ERTILP multiplication

This method has specials parameters: threads per row, block size, prefetch - number of requests for access to data notified in advance.

    m = matrixmultiplication.multiply_ertilp(matrix,
                                             threads_per_row=2,
                                             prefetch=2,
                                             block_size=128,
                                             repeat=10)
    print 'List of execution times: {0}'.format(m[1])

1.6 CSR multiplication

This method has only one specials parameters: block size.

    m = matrixmultiplication.multiply_csr(matrix,
                                          vector,
                                          block_size=128,
                                          repeat=10)
    print 'List of execution times: {0}'.format(m[1])

1.6 CPU (Numpy) multiplication

This method using fuction dot from numpy. It is here for comparison. It works only on CPU.

    m = matrixmultiplication.multiply_cpu(matrix,
                                          vector,
                                          repeat=10)
    print 'List of execution times: {0}'.format(m[1])

2 Multiplication in command line interface (CLI)

CLI to multiply is in module cli-multiply.py. Description all parameters get using flag --help.

As arguments required program takes two paths. First indicates vector, and the second matrix.

If the directory path vectors will then be searched for files .npy. The program alone try to match the vector for the given matrix. Will match the last alphabetically vector of length equal to the number of columns of the matrix If you do not succeed will be displayed message.

If the path is a directory matrix program will search it for files .mtx. Then try to perform calculations for each array in turn.

Specifying of two paths as directory is a simple way to automate testing on a collection.

You can easily specify all the parameters.
Multiplications are making a choice of formats by using the flag: -ell (ELLPACK), -sle (SLICED), -ert (ERTILP), -see (SERTILP), -csr (CSR) and -cpu (multiplication on CPU).

CLI allows you to display the result of multiplication, all execution times, average execution time and standard deviation of the mean.

You can also test the correctness of the returned results with a confidence factor. To do this, use the flag --test. The result will be the result of a standard function dot with Numpy package. If errors do not coming within specified range will be displayed corresponding messages.

You can also save the results of the measurements in the CSV file with the flags -o.

2.1 Examples

Multiplications matrix by vectors in ELLPACK, ERTILP and CPU formats with own parameters and print average time and standard deviation.

    
    user@host:~/SMDV$ python cli-multiply.py -b 128 -tpr 4 -p 2 -ell 
    -ert -cpu -avr -std Data/vectors/Vector_62451.npy Data/real/cant.mtx
    Multiply matrix Data/real/cant.mtx by the vector Data/vectors/Vector_62451.npy
    Multiplication with Numpy (only CPU)
    Average time [ms]: 7.84460783005
    Standard deviation: 0.0
    Multiplication with ELLPACK
    Average time [ms]: 0.6616320014
    Standard deviation: 0.0
    Multiplication with ERTILP
    Average time [ms]: 0.466080009937
    Standard deviation: 0.0

Auto search vector:


    user@host:$ python cli-multiply.py Data/vectors/ Data/real/cant.mtx
    Multiply matrix Data/real/cant.mtx by the vector Data/vectors/Vector_62451.npy

Auto search matrices and vectors:


    user@host:~/projekty/SMDV$ python cli-multiply.py Data/vectors/ Data/real/
    Multiply matrix Data/real/mac_econ_fwd500.mtx by the vector Data/vectors/Vector_206500.npy
    Multiply matrix Data/real/cop20k_A.mtx by the vector Data/vectors/Vector_121192.npy
    Multiply matrix Data/real/wbp256.mtx by the vector Data/vectors/Vector_65537.npy

Data generation

In your script use module matrixutilites.py. Use method generate_sparse_matrix to generate sparse matrix (for very dense is inefficient).
Use method generate_vector to generate dense vector (for very sparse is inefficient).

You can save the matrix generated using method save_matrix_to_file. Format file is Market Matrix File (.mtx).
For vectors use save_vector_to_numpy_file. Format file is Numpy Binary File (.npy). These methods are "overlay" functions scipy.io.mmwrite and numpy.save to help you create a name for the file.

You can use CLI cli-gen to generate data.
Use command matrix to matrix generation. Parameter -s INT INT specifies dimension matrix (width, height). You can use it repeatedly. Please note the exhaustion of memory when generating many large matrices. Using other parameters you can specify the minimum, maximum, percentage of zeros, precision and the type of generated numbers. Use --help for more info.

Use command vector to vector generation. Parameter -l INT specifies length of vector. You can use it repeatedly. For info about other parameters use --help.

Use command save to save all generated data. Can you define directory and filename.

Use command echo to display generated matrix and vectors. Caution when large matrices.

The commands are executed sequentially from left to right. Their order is important! First should be generating command (vectors or/and matrix). Then are (optional) save command and display command.


    user@host:$ python cli-gen.py vector -l 5 -i matrix -s 3 3 -min 1 -max 5 echo
    Vector (len: 5):
      (0)            0
      (1)            4
      (2)           -9
      (3)            5
      (4)            0
    Matrix (rows: 3, cols: 3):
      (0, 0)        5.0
      (0, 1)        1.0
      (1, 1)        1.0
      (1, 2)        4.0
      (2, 1)        5.0

Information about data

cli-info returns basic information about matrices and vectors stored in files, such as size, number of non-zero value and density. You can save this information to a CSV file using parameter -o FILEPATH. You can also view the contents of a small matrix.

CLI as an argument to the paths to files and directories. If you specify the path to the directory will be checked all the matrices and vectors within it.


    user@host:~/SMDV$ python cli-info.py Data/Generated/
    Process: Data/Generated/Matrix_67x67.mtx
    Info about matrix:
      rows         67
      cols         67
      nnz           4
      sparsing [%]    0.0891
    Process: Data/Generated/Vector_5.npy
    Info about vector:
      length         5
      nnz           2
      sparsing [%]      40.0

Building kernels

You can use the formats used by us in their projects. CUDA code is located in the kernels. Kernels are prepared to metaprogramming. It is built and compiled by cudacodes.py. Methods get_cuda_FORMAT (where FORMAT is name of kernel) return tuple with multiplication function and texture. Some of them take arguments to optimize the code for a given situation. If you enter an incorrect value of the code will not work correctly.
Your job is to just set the texture to the correct address in the GPU memory and call the function with the appropriate parameters.
For examples see to matrixmultiplication.py where we use it.

References

"Efficient Sparse Matrix-Vector Multiplication on CUDA", Nathan Bell, Michael Garland [11 December 2008]
"The sparse matrix vector product on GPUs", F. Vazquez, E. M. Garzon, J. A. Martınez, J. J. Fernandez [14 June 2009]
"Improving the performance of the sparse matrix vector product with GPUs", F. Vazquez, G. Ortega, J.J. Fernandez, E.M. Garzon [2010]
"Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures", Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan
"A memory efficient and fast sparse matrix vector product on a gpu", A. Dziekonski, A. Lamecki, and M. Mrozowski

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
examples		examples
kernels		kernels
vectors		vectors
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli-conv.py		cli-conv.py
cli-gen.py		cli-gen.py
cli-info.py		cli-info.py
cli-multiply.py		cli-multiply.py
cudacodes.py		cudacodes.py
filesutilites.py		filesutilites.py
listutilites.py		listutilites.py
matrixformat.py		matrixformat.py
matrixmultiplication.py		matrixmultiplication.py
matrixutilites.py		matrixutilites.py

License

fivitti/SMDV

Folders and files

Latest commit

History

Repository files navigation

SMDV

Description

Supported formats

Authors

License

Functions

Dependencies

Results

How to run

1. Multiplication in script

1.1 Full script

1.2 ELLPACK multiplication

1.3 SLICED multiplication

1.4 SERTILP multiplication

1.5 ERTILP multiplication

1.6 CSR multiplication

1.6 CPU (Numpy) multiplication

2 Multiplication in command line interface (CLI)

2.1 Examples

Data generation

Information about data

Building kernels

References

About

Resources

License

Stars

Watchers

Forks

Languages