- Use Qt4.8 designer. Label linedits and checkboxes correctly - give tooltip when appropriate
- Convert ui to py: pyuic4 –output Agilent2DicomQt.py Agilent2DicomQt.ui
- Modify Agilent2DicomAppQt.py
New filters/methods
- Create new method in separate file e.g. cplxfilter.py,kspace_filter.py
- Create new argument in fid2dicom.py
- Create new arguments in fid2dcm.sh
- Add code to GUI backend Agilent2DicomAppQt.py
agilent2dicom_globalvars.py
is readable by shell and python. Make
sure that there are no spaces between variable name, =, or the string
label.
AGILENT2DICOM_VERSION="1.6.4"
AGILENT2DICOM_APP_VERSION="1.8"
FDF2DCMVERSION="1.2"
FID2DCMVERSION="1.4"
After changing the QT gui in designer
use the following line to update the python version:
pyuic4 --output Agilent2DicomQt.py Agilent2DicomQt.ui
Mercurial keyword expansion is used to set some handy variables. The hgrc file in the development .hg folder should be the following:
[paths]
default = ssh://hg@bitbucket.org/mbi-image/agilent2dicom
[extensions]
keyword =
[keyword]
agilent2dicom.py =
agilentFDF2dicom.py =
fdf2dcm.sh =
agilent2dicom_globalvars.py =
fid2dicom.py =
fid2dcm.sh =
Agilent2DicomApp.py =
Agilent2DicomAppQt.py =
[keywordmaps]
Author = {author|user}
Date = {date|utcdate}
Header = {root}/{file},v {node|short} {date|utcdate} {author|user}
Id = {file|basename},v {node|short} {date|utcdate} {author|user}
Revision = {node|short}
Use the kwexpand command to enable config tags to be filled correctly.
hg kwexpand
Using git-rsc-keywords package.
sortpp alphabetises the procpar format for easier comparison using diff. The awk script pastes the label and value lines together, then stores the string in an array. The final output is sorted before being printed.
diff -y <(./sortpp <../ExampleAgilentData/kidney512iso_01.fid/procpar) <(./sortpp <../example_data/s_2014072901/T2-cor_01.fid/procpar)
procdiff implements the above line in a shell script
Use ipython for interactive sessions with kspace_filter:
ipython -i ./kspace_filter.py -- -v -i ../ExampleAgilentData/kidney512iso_01.fid/ -o ../ExampleAgilentData/kidney512iso_01.dcm/
kspgauss =kspacegaussian_filter2(ksp,256/0.707)
image_filtered = fftshift(ifftn(ifftshift(kspgauss)))
# print "Saving Gaussian image"
save_nifti(normalise(np.abs(image_filtered)),'gauss_kspimage')
print "Computing Gaussian sub-band1.5 image from Original image"
kspsubband =fouriergausssubband15(ksp.shape,0.707/256)
image_filtered = fftshift(ifftn(ifftshift(kspsubband)))
# print "Saving Gaussian image"
save_nifti(normalise(np.abs(image_filtered)),'gauss_subband')
image_corr = inhomogeneousCorrection(ksp,ksp.shape,3.0/60.0)
save_nifti(np.abs(image_filtered/image_corr),'image_inhCorr3')
F{ D^2 f(x,y,z) } = (-2 pi k)^2 F(k)
print "Computing Laplacian enhanced image"
laplacian = fftshift(ifftn(ifftshift(kspgauss * fourierlaplace(ksp.shape))))
alpha=ndimage.mean(np.abs(image_filtered))/ndimage.mean(np.abs(laplacian))
image_lfiltered = image_filtered - 0.5*alpha*laplacian
save_nifti(np.abs(image_lfiltered),'laplacian_enhanced')
print "Computing Gaussian Laplace image from Smoothed image"
ksplog=kspacelaplacegaussian_filter(ksp,256.0/0.707)
image_filtered = fftshift(ifftn(ifftshift(ksplog)))
image_filtered= (np.abs(image_filtered))
image_filtered= (image_filtered-ndimage.minimum(image_filtered))/(ndimage.maximum(image_filtered) - ndimage.minimum(image_filtered))
save_nifti(np.abs(image_filtered),'Log_image')
Flaplace = fourierlaplace(ksp.shape)
#Flaplace = (Flaplace - ndimage.minimum(Flaplace))/(ndimage.maximum(Flaplace)-ndimage.minimum(Flaplace))
Fsmooth =fouriergauss(ksp.shape,(4.0*np.sqrt(2.0*np.log(2.0)))/512.0)
Fgauss = fouriergauss(ksp.shape,0.707/512) #
laplacian = fftshift(ifftn(ifftshift(ksp * (Fgauss/ndimage.maximum(Fgauss)) * Flaplace * (Fsmooth/ndimage.maximum(Fsmooth)) )))
smoothlaplacian = (laplacian - ndimage.minimum(laplacian))/(ndimage.maximum(laplacian)-ndimage.minimum(laplacian))
print "Saving Smoothed Gauss Laplacian"
save_nifti(np.abs(smoothlaplacian),'Log_smoothed')
image_filtered = fftshift(ifftn(ifftshift(ksplog*Fsmooth)))
save_nifti(np.abs(image_filtered),'Log_smoothed2')
test_double_resolution(kspgauss,'Gauss')
test_double_resolution(ksplog,'LoG')
test_depth_algorithm(fftshift(ifftn(ifftshift(kspgauss))),'gauss_kspdepth')
test_double_resolution_depth(kspgauss,'gauss_depth_large')
https://scipy-lectures.github.io/advanced/image_processing/ https://scipy-lectures.github.io/packages/scikit-image/
using PIL http://nbviewer.ipython.org/github/mroberts3000/GpuComputing/blob/master/ IPython/GaussianBlur.ipynb
http://www.lancs.ac.uk/~struijke/density/kernel.html http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/ebooks/html/anr/anrhtmlframe33.html
kspace_filter.py has many bespoke ndarray operations that would be suitable for optimisation using cython
- kspace_filter.py was copied to filterkspace.pyx
- define data types in all the functions as per http://docs.cython.org/src/userguide/numpy_tutorial.html#numpy-tutorial
from __future__ import division
import numpy as np
cimport numpy as np
...
DTYPE = np.int
NP_FLOAT = np.float32
NP_CMPLX = np.complex64
...
ctypedef np.int_t DTYPE_t
ctypedef np.float32_t NP_FLOAT_t
ctypedef np.complex64_t NP_CMPLX_t
...
def fouriercoords(np.ndarray siz):
...
cdef np.ndarray sz = np.ceil(np.array(siz) / 2.0)
cdef np.ndarray xx = np.array(range(-int(sz[0]), int(sz[0])))
cdef np.ndarray yy = np.array(range(-int(sz[1]), int(sz[1])))
cdef np.ndarray zz
cdef DTYPE_t maxlen = ndimage.maximum(np.array(siz))
cdef np.ndarray mult_fact, uu, vv, ww
- Include type and size checks in each function
if siz.ndim != 1 and (siz.size != 3 or siz.size != 2):
raise ValueError("Only 2D or 3D dimensions on filter supported")
assert siz.dtype == DTYPE
- Create a setup script (setup.py)
- Include numpy headers for MASSIVE in filterkspace header _numpyconfig.h not found, even with numpy.get_headers() in setup. Numpy build path put at top of filterkspace.pyx:
# distutils: include_dirs = /usr/local/src/PYTHON/NUMPY/numpy-1.7.0/numpy/core/include/ /usr/local/src/PYTHON/NUMPY/numpy-1.7.0/build/src.linux-x86_64-2.7/numpy/core/include/numpy/
- Compile:
python setup.py build_ext --inplace
Note: for MASSIVE users the default gcc produces errors. Use module load gcc/4.7.2 mpfr/3.1.2 gmp/5.1.1
At the top of filterkspace.pyx. place this:
# cython: profile=True
speed up type definitions in function arguments
...
def fouriergauss(np.ndarray[DTYPE_t, ndim=1] siz, np.ndarray[NP_FLOAT_t, ndim=1] sigma):
...
cdef np.ndarray[DTYPE_t, ndim=1] sz = np.ceil(np.array(siz) / 2.0)
cdef np.ndarray[DTYPE_t, ndim=1] xx = np.array(range(-int(sz[0]), int(sz[0])))
cdef np.ndarray[DTYPE_t, ndim=1] yy = np.array(range(-int(sz[1]), int(sz[1])))
cdef np.ndarray[DTYPE_t, ndim=1] zz
cdef DTYPE_t maxlen = ndimage.maximum(np.array(siz))
cdef np.ndarray[NP_FLOAT_t, ndim=3] mult_fact, uu, vv, ww
Disable numpy specific bounds checking
http://docs.cython.org/src/tutorial/numpy.html
cimport cython
@cython.boundscheck(False) # turn of bounds-checking for entire function
def fouriercoords(np.ndarray[DTYPE_t, ndim=1] siz):
...
@cython.boundscheck(False) # turn of bounds-checking for entire function
def fouriergauss(np.ndarray[DTYPE_t, ndim=1] siz, np.ndarray[NP_FLOAT_t, ndim=1] sigma):
Notes for DrDobbs opencl with python Building and Deploying a Kernel
To build and deploy a basic OpenCL kernel, you usually need to follow these steps in a typical OpenCL C++ host program:
Obtain an OpenCL platform. Obtain a device id for at least one device (accelerator). Create a context for the selected device or devices. Create the accelerator program from source code. Build the program. Create one or more kernels from the program functions. Create a command queue for the target device. Allocate device memory and move input data from the host to the device memory. Associate the arguments to the kernel with kernel object. Deploy the kernel for device execution. Move the kernel’s output data to host memory. Release context, program, kernels and memory.
These steps represent a simplified version of the tasks that your host program must perform (each step is a bit more complex in real life). For example, the first step (obtain an OpenCL platform) usually requires checking the properties for the platforms, as I explained in the previous section. In addition, each step requires error checking. Because you can work with PyOpenCL from any Python console, you can execute the different steps with an interactive environment that makes it easy for you to learn both OpenCL and the way PyOpenCL exposes the features in the API.
Cuda 5.5 and python 2.7.8 are enabled on MASSIVE
module load gcc/4.7.2 mpfr/3.1.2 gmp/5.1.1 cuda/5.5
See cuda directory.
CUDADevice with properties on MASSIVE (see http://en.wikipedia.org/wiki/Nvidia_Tesla for comparative properties)
Property Value Name: ‘Tesla M2070-Q’ ComputeCapability: 2 SupportsDouble: Yes DriverVersion: 5.5 ToolkitVersion 5 MaxThreadsPerBlock 1024 MaxShmemPerBlock 49152 MaxThreadBlockSize [1024 1024 64] MaxGridSize [65535 65535 65535] SIMDWidth 32 TotalMemory 5.6366e+09 FreeMemory 5.5217e+09 MultiprocessorCount 4 ClockRateKHz 1147000 GPUOverlapsTransfers 1 KernelExecutionTimeout 1 CanMapHostMemory 1 ComputeMode Default
pyfft3d.py
The resulting normalised error should be around 3e-7, given the nature of single-precision floating point numbers. 512x512x512 3D images (usertime)
CPU (s) GPU (write+exec+get) ms fftn 19.9 278+107+281 fftn 20.2 278+75+280 ifftn 32.6 285+78+272 total 72.7 sec 1935 ms 97.3% speedup Second run (total runtime)
CPU (s) GPU (write+exec+get) ms fftn 22.7 294+130+682 fftn 25.9 292+130+681 ifftn 42.8 286+129+681 total 91.4 sec 3305 ms 96.4% speedup
1024x1024x1024 3D images: Does not work - memory overflow errors.
256x256x256 3D images:
CPU (s) GPU (write+exec+get) ms fftn 2.65 37+15.9+84.3 fftn 2.56 37+15.5+84.8 ifftn 4.68 38.5+15.4+205 total 9.89 sec 533.4 ms 94.6% speedup
pyfft2dslice.py
Cutting down large volumes using 2d slices or slabs of 3D images.
Four slab 3D fft 1024/4=256 (1024x1024x256 )x4 Reference matrix multilpication on CPU: 1min 59s (cold) 13.3 s (hot) (1024x1024x256 )x4 3D images:
CPU (sec) 1024x1024x1024 GPU (write+exec+get)x4 sec (1024x1024x256 )x4 fftn 3min 49s 1.78+.262+2.44+1.76+.261+2.47+1.71+.26+1.89+1.75+.26+1.87= 16.713 fftn 3min 53s 1.75+.261+1.85+1.76+.261+1.85+1.69+.261+1.85+1.75+.26+1.84= 15.383 ifftn 6min 13s 1.77+.261+1.9+1.79+.261+1.89+1.71+.261+1.9+1.77+.26+1.88= 15.653 total 48.2 92.7% speedup
Run two 9: (1024x1024x256 )x4 3D images:
CPU (sec) 1024x1024x1024 GPU (write+exec+get)x4 sec (1024x1024x256 )x4 fftn 3min 37s 2.23+.263+1.83+2.22+.261+1.92+2.19+.262+1.94+2.23+.261+1.92= 17.527 fftn 3min 39s 2.24+.261+1.83+2.24+.261+1.92+2.19+.261+1.94+2.22+.261+1.92=17.544 ifftn 6min 20s 2.54+.261+1.83+2.08+.261+1.92+2.04+.261+1.94+2.09+.261+1.91=17.394 total 13min 36s (835 s) 52.465 93.7% speedup, Norm error
Notes: actual FT output has not been compared to standard FT output
pip install --user scikits.cuda
Example: cufft.py
Reconstruction from complex kspace
N = 512
batch_size = 512
print 'Testing recon ifft (in-place)..'
tic()
x = np.empty((batch_size, N, N), np.complex64)
x = np.asarray(np.random.rand(batch_size, N, N), np.complex64)
x.imag = np.random.rand(batch_size, N, N)
x_gpu = gpuarray.to_gpu(x)
# xf_gpu = gpuarray.empty((batch_size, N, N), np.complex64)
plan = cu_fft.Plan((batch_size, N, N), np.complex64, np.complex64, 1)
toc()
tic()
cu_fft.ifft(x_gpu, x_gpu, plan, True)
toc()
tic()
y = numpy.fft.ifftn(x)
toc()
print 'Success status: ', np.allclose(y, x_gpu.get(), atol=1e-6)
In [48]: plan = cu_fft.Plan((batch_size, N, N), np.complex64, np.complex64, 1) In [49]: toc() Elapsed time is 14.3354101181 seconds. Out[49]: '14.3354740143' In [50]: tic() In [51]: cu_fft.ifft(x_gpu, x_gpu, plan, True) In [52]: toc() Elapsed time is 3.01940393448 seconds. Out[52]: '3.0195069313' In [53]: tic() In [54]: y = numpy.fft.ifftn(x) In [55]: toc() Elapsed time is 35.2300841808 seconds. Out[55]: '35.2301981449' In [56]: print 'Success status: ', np.allclose(y, x_gpu.get(), atol=1e-6) Success status: True
Speed test 2: exp(-(a^2 + b^2 + c^2)) blocks=6400 block_size = 128 Using nbr_values == 819200 Calculating 1000 iterations SourceModule time and first three results: 0.053058s, [ 0.54412651 0.54412651 0.54412651] Elementwise time and first three results: 0.204747s, [ 0.54412651 0.54412651 0.54412651] Elementwise Python looping time and first three results: 0.300244s, [ 0.54412651 0.54412651 0.54412651] GPUArray time and first three results: 3.665145s, [ 0.54412651 0.54412651 0.54412651] CPU time and first three results: 31.908086s, [ 0.54412651 0.54412651 0.54412651]
Speed test 3: exp(-2*pi^2(a^2 + b^2 + c^2))
Reikna IFFT time and first three results: 12.443403s, [ 3.83110513e-05 6.72008329e-05 9.83252550e-05] Numpy IFFTN time and first three results: 43.439137s, [ 3.83110483e-05 6.72008469e-05 9.83252633e-05] 2.22041709388e-07
import numpy as np
## REINKA
import pycuda.driver as drv
import pycuda.tools
import pycuda.autoinit
from reikna.cluda import dtypes, any_api
from reikna.fft import FFT
from reikna.core import Annotation, Type, Transformation, Parameter
# create two timers so we can speed-test each approach
start = drv.Event()
end = drv.Event()
api = any_api()
thr = api.Thread.create()
N=512
start.record()
data_dev = thr.to_device(ksp)
cifft = ifft.compile(thr)
cifft(data_dev, data_dev,inverse=0)
result = numpy.fft.fftshift(data_dev.get() / N**3)
result = result[::-1,::-1,::-1]
result = np.roll(np.roll(np.roll(result,1,axis=2),1,axis=1),1,axis=0)
end.record()
end.synchronize()
secs = start.time_till(end)*1e-3
print "Reikna IFFT time and first three results:"
print "%fs, %s" % (secs, str(np.abs(result[:3,0,0])))
start.record()
reference = numpy.fft.fftshift(numpy.fft.ifftn(ksp))
end.record()
end.synchronize()
secs = start.time_till(end)*1e-3
print "Numpy IFFTN time and first three results:"
print "%fs, %s" % (secs, str(np.abs(reference[:3,0,0])))
print "Norm difference", numpy.linalg.norm(result - reference) / numpy.linalg.norm(reference)
Reikna IFFT time and first three results: 11.012042s, [ 3.83110513e-05 6.72008329e-05 9.83252550e-05] Numpy IFFTN time and first three results: 43.887570s, [ 3.83110483e-05 6.72008469e-05 9.83252633e-05]
Norm difference 2.22041709388e-07
from pyfft.cl import Plan
import pyopencl as cl
import pyopencl.array as cl_array
tic()
ctx = cl.create_some_context(interactive=False)
queue = cl.CommandQueue(ctx)
w = h = k = 512
plan = Plan((w,h,k), normalize=True, queue=queue)
gpu_data = cl_array.to_device(queue, ksp)
plan.execute(gpu_data.data, inverse=True)
result = gpu_data.get()
result = np.fft.fftshift(result)
print "PyFFT OpenCL IFFT time and first three results:"
print "%s sec, %s" % (toc(), str(np.abs(result[:3,0,0])))
tic()
reference = np.fft.fftshift(np.fft.ifftn(ksp))
print "Numpy IFFTN time and first three results:"
print "%s sec, %s" % (toc(), str(np.abs(reference[:3,0,0])))
print "Calulating L1 norm "
print np.linalg.norm(result - reference) / np.linalg.norm(reference)
PyFFT OpenCL IFFT time and first three results: 4.00547480583 sec, [ 3.83110491e-05 6.72008246e-05 9.83252612e-05]
Numpy IFFTN time and first three results: 46.8406469822 sec, [ 3.83110483e-05 6.72008469e-05 9.83252633e-05]
Calulating L1 norm 4.21388290199e-07
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy as np
import scikits.cuda.fft as cu_fft
N=512
batch_size=512
tic()
x_gpu = gpuarray.to_gpu(ksp)
plan_inverse = cu_fft.Plan((N, N), np.complex64, np.complex64, batch_size)
cu_fft.ifft(x_gpu, x_gpu, plan_inverse, True)
result = np.fft.fftshift(x_gpu.get() )
#result = np.fft.fftshift(data_dev.get() / N**3)
#result = result[::-1,::-1,::-1]
#result = np.roll(np.roll(np.roll(result,1,axis=2),1,axis=1),1,axis=0)
print "Scikits CUDA IFFT time and first three results:"
print "%s sec, %s" % (toc(), str(np.abs(result[:3,0,0])))
Scikits CUDA IFFT time and first three results: 4.06088709831 sec, [ 0.00219029 0.00188084 0.00227454]
See README file in matlab/NLmeans