# Tsunami benchmark

On fitzroy (NIWA's IBM/AIX Power6 supercomputer), follow the [installation
instructions](/src/INSTALL), then edit `src/examples/tsunami.c` and
replace `/home/popinet/terrain/etopo2` with
e.g. `/hpcf/data/popinet/terrain/etopo2`, then do

~~~bash
cd src/examples
qcc -O2 -g -pg tsunami.c -o tsunami ../kdt/kdt.o -lm
~~~

Create e.g. `tsunami.sh` as

~~~bash
#!/bin/bash

#@ job_name         = basilisk
#@ class            = General
#@ job_type         = serial
#@ output           = tsunami.out
#@ error            = tsunami.log
#@ account_no       = HAFS1301
#@ wall_clock_limit = 45:00
#@ environment      = COPY_ALL
#@ queue

./tsunami
~~~

then do

~~~bash
llsubmit tsunami.sh
~~~

After completion of the run do

~~~bash
tail -n1 tsunami.out
# Quadtree, 2992 steps, 2095.75 CPU, 2260 real, 6.85e+04 points.step/s, 28 var
~~~

Note that the computational speed (in points.step/s) is computed using
the real runtime (not the CPU time). For reference, the results on my
system `popinet-new` (Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz) are

~~~bash
tail -n1 tsunami.out
# Quadtree, 2992 steps, 1190.36 CPU, 1216 real, 1.28e+05 points.step/s, 28 var
~~~

## Profiling graphs

They can be generated using
[gprof2dot](http://code.google.com/p/jrfonseca/wiki/Gprof2Dot)

~~~bash
gprof tsunami gmon.out | gprof2dot.py | dot -Tpng -o tsunami.dot.png
~~~

- [popinet-new](tsunami/tsunami.dot.png)
- [fitzroy](tsunami/fitzroy.dot.png)

## Parallel run

We will first turn off the generation of movies as they
are expensive. To do this edit `tsunami.c` and add

~~~c
#if 0
event movies (t++) {
...
#endif
event do_adapt (i++) adapt();
...
~~~

To use OpenMP on fitzroy do

~~~bash
qcc -O2 -qsmp=omp tsunami.c -o tsunami ../kdt/kdt.o -lm
~~~

and submit the job as

~~~bash
#!/bin/bash

#@ job_name         = basilisk
#@ class            = General
#@ job_type         = parallel
#@ node             = 1
#@ tasks_per_node   = 4
#@ task_affinity    = core(1)
#@ output           = tsunami-4.out
#@ error            = tsunami-4.log
#@ account_no       = HAFS1301
#@ wall_clock_limit = 30:00
#@ environment      = COPY_ALL
#@ queue

OMP_NUM_THREADS=4 ./tsunami
~~~

The results on `fitzroy` are

~~~bash
1 core
# Quadtree, 2718 steps, 838.585 CPU, 921.3 real, 1.52e+05 points.step/s, 28 var
4 cores
# Quadtree, 2718 steps, 1030.61 CPU, 642.4 real, 2.17e+05 points.step/s, 28 var
8 cores
# Quadtree, 2718 steps, 1224.36 CPU, 536.5 real, 2.6e+05 points.step/s, 28 var
~~~

The results on `popinet-new` are

~~~bash
1 core
# Quadtree, 2719 steps, 643.21 CPU, 647.8 real, 2.17e+05 points.step/s, 28 var
4 cores
# Quadtree, 2719 steps, 1398.57 CPU, 388.8 real, 3.62e+05 points.step/s, 28 var
~~~

The results on `mesu` (UPMC's cluster) are

~~~bash
1 core
# Quadtree, 2810 steps, 451.53 CPU, 451.9 real, 3.78e+05 points.step/s, 28 var
2 cores
# Quadtree, 2810 steps, 512.59 CPU, 315.4 real, 5.42e+05 points.step/s, 28 var
4 cores
# Quadtree, 2810 steps, 567.32 CPU, 229.8 real, 7.44e+05 points.step/s, 28 var
8 cores
# Quadtree, 2810 steps, 546.32 CPU, 147.1 real, 1.16e+06 points.step/s, 28 var
16 cores
# Quadtree, 2810 steps, 2236.47 CPU, 292.1 real, 5.85e+05 points.step/s, 28 var
32 cores
# Quadtree, 2810 steps, 5255.56 CPU, 361.4 real, 4.73e+05 points.step/s, 28 var
64 cores
# Quadtree, 2810 steps, 13266.5 CPU, 410.9 real, 4.16e+05 points.step/s, 28 var
~~~

The results on `heyward.dalembert.upmc.fr` are

~~~bash
1 core
# Quadtree, 2810 steps, 898.34 CPU, 898.9 real, 1.89e+05 points.step/s, 28 var
8 cores
# Quadtree, 2810 steps, 1692.4 CPU, 393.6 real, 4.31e+05 points.step/s, 28 var
~~~
