Standard lattice layout:
 4 dimensions
 Node remapping: TRIVIAL (no effort made to reorder)

 Sites on node: 32 x 32 x 32 x 16
 Processor layout: 1 x 1 x 1 x 2
Matrix * Matrix: 14.6875ms 
Vector * Matrix: 5.07812 ms 
Vector square sum: 1.16211 ms 
Dirac 4 dirs: 56.25ms 
Dirac: 50.625ms 
CG: 70.625ms / iteration
 COMMS from node 0: 132 done, 816(86.0759%) optimized away
