Professional Documents
Culture Documents
Aaftab Munshi
Opportunity: Processor Parallelism...
• Today’s processors are increasingly parallel
• CPUs
■ Multiple cores are driving performance increases
• GPUs
■ Transforming into general purpose data-parallel computational
coprocessors
■ Improving numerical precision (single and double)
• Runtime
■ resource management
■ execute compute kernels
• Compiler
■ A subset of ISO C99 with appropriate language additions
■ Compile and build compute program executables
■ online or offline
• Compute Program
■ Collection of compute kernels and internal functions
■ Analogous to a dynamic library
execution instances
■ __private
■ __private
■ __local
■ convert_type<_sat><_roundingmode>
■ Image types
■ image2d_t, image3d_t and sampler_t
■ relational
■ geometric functions
■ synchronization functions
■ relational
■ geometric functions
■ synchronization functions
// create a work-queue
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
memobjs[1] = clCreateBuffer(context,
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
memobjs[1] = clCreateBuffer(context,
CL_MEM_READ_WRITE,
// create a work-queue
queue = clCreateWorkQueue(context, NULL, NULL, 0);
memobjs[1] = clCreateBuffer(context,
CL_MEM_READ_WRITE,
sizeof(float)*2*num_entries, NULL);
// execute kernel
// execute kernel
clExecuteKernel(queue, kernel, NULL, range, NULL, 0, NULL);
localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15)));
// four radix-4 function calls
fftRadix4Pass(data); fftRadix4Pass(data + 4);
fftRadix4Pass(data + 8); fftRadix4Pass(data + 12);