You are on page 1of 2

CS701 High Performance Computing

Programming Assignments
A1 - SystemC and OpenMP
SystemC Programming
(a) Soft Deadline: 00:00AM August, 10. Hard Deadline: 00:00AM August, 12. Submissions to be done through
email only. Pack your report, code, screenshots and other files in an archive and mail to cs701.nitk@gmail.com.
(b) Bonus marks for creative problem solving. (c) All are team assignments. Not more than two students in a
team. One submission per team.
Submission guidelines: (a) Assignment report: Answer to each question will typically contain block diagrams/microarchitecture, brief explanation, and other relevant info. (b) Auxilary files to submit: Per question,
include one or more of the following files along with the report wherever valid: SystemC codes of the design
and the testbench, execution screenshots, VCD dump, gtkwave screenshots.
1. Full Adder. Implement a combinational full adder (FA).
2. Single Register. Implement an 8 bit register inside a Register Block. The register block takes in 3 inputs
- (a) read bit (b) write bit (c) 8 bit write data. It has one output - 8b read data. Working of the register block
follows. At the positive edge of the clock:
If read input is ON, output value from the register.
If write input is ON, write the value from write data into the register.
If both read and write are ON, read precedes write.
3. Basic interconnection network. Implement a 2 node point to point interconnection network as shown in
the following figure.

Implement a version exhibiting the following behaviour: After random intervals, A sends one message to B. B
responds with 4 replies. A prints sent and received messages at the output.

OpenMP Programming
1. Hello World program. Fork out multiple threads from the main process. All the threads are assigned
individual identifiers from the main process. Each thread should identify itself and print out a hello world
message. The master thread should print out environmental information. Environmental information include
total number of CPUs/cores available for OpenMP (use omp get num procs()), current thread ID in the parallel
region, total number of threads available in this parallel region, total number of threads requested.
2. Sum of Two Arrays. Compute the itemwise sum of two large arrays A and B and populate array C.
(C[i]=A[i] + B[i] in a loop). Portions of the arrays are computed in parallel across the team of threads.
3. Matrix Multiply. Implement parallel implementation of multiplication of large matrices (100 x 100 or more).
Threads can share row iterations evenly.

A2 - CUDA and OpenCL


Points to Note: (a) Soft Deadline: 00:00AM August, 22. Hard Deadline: 00:00AM August, 24. Submissions
to be done through email only. Pack your report, code, screenshots and other files in an archive and mail to
cs701.nitk@gmail.com. (b) Bonus marks for creative problem solving. (c) All are team assignments. Not more
than two students in a team. One submission per team.
CUDA and OpenCL can be used to distribute computational tasks between the CPU (the host) and the graphics
accelerator/GPU (the device). Program the following two problems on CUDA and OpenCL platforms. OpenCL
SDK from AMD is here: http://developer.amd.com/tools-and-sdks/.
1. Matrix Multiply. Parallel implementation of multiplication of large matrices (100 x 100 or more). Threads
can share row iterations evenly.
2. SAXPY program. SAXPY: S stands for Single precision, A is a scalar value, X and Y are one-dimensional
vectors, P stands for Plus. Operation a*X[i] + Y[i]. Write an OpenCL program to perform SAXPY on two
large vectors X and Y. The main program should print out environment details of the host and the device
before beginning computation. You may use functions such as clGetDeviceIDs() and clGetDeviceInfo() to get
the following info: no. of hosts, no. of devices, device type, no. of compute units in the device, clock frequency,
address bits, memory size, and other parameters of your interest.

You might also like