Exercise 1
- Login to the homework machine
ssh username@homework.cs.tufts.edu
- Copy the example files
- In your home directory, create a subdirectory for the example codes
and then cd to it.
mkdir openMP
cd openMP
- Then, copy either the Fortran or the C version of the parallel OpenMP
exercise files to your openMP subdirectory:
cp /comp/140/public_html/labs/lab7/samples/* ~/openMP
- List the contents of your openMP subdirectory
You should notice the following files. Note: Most of these are simple example files. Their primary purpose is to demonstrate the basics of how to parallelize a code with OpenMP. Most execute in a second or two.
- Create, compile and run an OpenMP "Hello world" program
- Using your favorite text editor (vi/vim, emacs, nedit, gedit, nano...) open a
new file - call it whatever you'd like.
- Create a simple OpenMP program that does the following:
- Creates a parallel region
- Has each thread in the parallel region obtain its thread id
- Has each thread print "Hello World" along with its unique thread id
- Has the master thread only, obtain and then print the total number of
threads
If you need help, see the provided omp_hello.c file.
- Using gcc, compile your hello
world OpenMP program. This may take several attempts if there are any code
errors. For example:
gcc -fopenmp omp_hello.c -o hello
When you get a clean compile, proceed.
- Run your hello executable and notice its output.
- Is it what you expected? As a comparison, you can compile and run the
provided omp_hello.c or
omp_hello.f example program.
- How many threads were created? By default, GNU compiler will
create 1 thread for each core.
- Vary the number of threads and re-run Hello World
- Explicitly set the number of threads to use by means of the OMP_NUM_THREADS
environment variable:
export OMP_NUM_THREADS=8
- Your output should look something like below.
Hello World from thread = 0
Hello World from thread = 3
Hello World from thread = 2
Number of threads = 8
Hello World from thread = 6
Hello World from thread = 1
Hello World from thread = 4
Hello World from thread = 7
Hello World from thread = 5 |
- Run your program several times and observe the order of print statements.
Notice that the order of output is more or less random.
This completes Exercise 1
Exercise 2
- Review / Compile / Run the workshare1 example code
This example demonstrates use of the OpenMP loop work-sharing construct.
Notice that it specifies dynamic scheduling of threads and assigns a
specific number of iterations to be done by each thread.
- First, set the number of threads to 4:
export OMP_NUM_THREADS=4
- After reviewing the source code, use your preferred compiler to compile
and run the executable. For example:
gcc -fopenmp omp_workshare1.c -o workshare1
./workshare1 | sort
- Review the output. Note that it is piped through the sort utility.
This will make it easier to view how loop iterations were actually
scheduled across the team of threads.
- Run the program a couple more times and review the output.
What do you see? Typically, dynamic scheduling is not deterministic.
Everytime you run the program, different threads can run different
chunks of work. It is even possible that a thread might not do any
work because another thread is quicker and takes more work. In fact,
it might be possible for one thread to do all of the work.
- Edit the workshare1 source file and change the dynamic scheduling to
static scheduling.
- Recompile and run the modified program. Notice the difference in
output compared to dynamic scheduling. Specifically, notice that
thread 0 gets the first chunk, thread 1 the second chunk, and so on.
- Run the program a couple more times. Does the output change?
With static scheduling, the allocation of work is deterministic and
should not change between runs, and every thread gets work to do.
- Reflect on possible performance differences between dynamic and
static scheduling.
- Review / Compile / Run the matrix multiply example code
This example performs a matrix multiple by distributing the iterations
of the operation between available threads.
- After reviewing the source code, compile and run the program. For example:
gcc -fopenmp omp_mm.c -o matmult
matmult
- Review the output. It shows which thread did each iteration and the
final result matrix.
- Run the program again, however this time sort the output to clearly see
which threads execute which iterations:
matmult | sort | grep Thread
Do the loop iterations match the SCHEDULE(STATIC,CHUNK) directive for
the matrix multiple loop in the code?
- Review / Compile / Run the workshare2 example code
This example demonstrates use of the OpenMP SECTIONS work-sharing construct
Note how the PARALLEL region is divided into separate sections, each of
which will be executed by one thread.
- As before, compile and execute the program after reviewing it. For
example:
gcc -fopenmp omp_workshare2.c -o workshare2
workshare2
- Run the program several times and observe any differences in output.
Because there are only two sections, you should notice that some threads do
not do any work. You may/may not notice that the threads doing work can vary.
For example, the first time thread 0 and thread 1 may do the work, and the
next time it may be thread 0 and thread 3. It is even possible for one thread
to do all of the work. Which thread does work is non-deterministic in this
case.
This completes Exercise 2
Exercise 3
Review / Compile / Run the orphan example code
This example computes a dot product in parallel, however it differs from
previous examples because the parallel loop construct is orphaned - it is
contained in a subroutine outside the lexical extent of the main program's
parallel region.
- After reviewing the source code, compile and run the program. For example:
gcc -fopenmp omp_orphan.c -o orphan
orphan | sort
- Note the result...and the fact that this example will come back to haunt
as omp_bug6 later.
Get environment information
- Starting from scratch, write a simple program that obtains information
about your openMP environment. Alternately, you can modify the "hello"
program to do this.
- Using the appropriate openMP routines/functions, have the master thread
query and print the following:
- The number of processors available
- The number of threads being used
- The maximum number of threads available
- If you are in a parallel region
- If dynamic threads are enabled
- If nested parallelism is supported
- If you need help, you can consult the omp_getEnvInfo
example file.
When things go wrong...
There are many things that can go wrong when developing OpenMP programs. The
omp_bugX.X series of programs demonstrate just a few.
See if you can figure out what the problem is with each case and then fix it.
The buggy behavior will differ for each example. Some hints are provided
below.
This completes the exercise.