Standard lattice layout:
 4 dimensions
 Node remapping: TRIVIAL (no effort made to reorder)

 Sites on node: 32 x 32 x 32 x 32
 Processor layout: 1 x 1 x 1 x 1
Matrix * Matrix: 28.75ms 
Vector * Matrix: 10.0781 ms 
Vector square sum: 2.28516 ms 
Dirac 4 dirs: 111.25ms 
Dirac: 100ms 
CG: 137.5ms / iteration
 COMMS from node 0: 68 done, 400(85.4701%) optimized away
