Cellfinder installation in a cluster computing environment.

Currently cellfinder has only been written with SLURM in mind. In theory, it should be easy enough to allow the use of any other job scheduler.

SLURM

Based on the SWC SLURM cluster, and so most of the command syntax will likely vary. Specifically, you are unlikely to have modules configured in exactly the same way as us.

Prepare the environment

  • On our cluster, modules are only available on a compute node, so start an interactive job on a GPU node, and request a GPU for testing.

srun -p gpu --gres=gpu:1 --pty bash
  • Load miniconda

module load miniconda

Set up conda environment and install cellfinder

  • Now you can proceed as with a local installation

    • Create and activate new minimal conda environment

        conda create --name cellfinder python=3.7
        conda activate cellfinder
    
    • Install CUDA and cuDNN

        conda install cudatoolkit=10.1 cudnn
    
    • Install cellfinder

        pip install cellfinder
    
  • Check that tensorflow and CUDA are configured properly:

    python
    
       import tensorflow as tf
       tf.test.is_gpu_available()
    

    If you see something like this, then all is well.

        2019-06-26 10:51:34.697900: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
        2019-06-26 10:51:34.881183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
        name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
        pciBusID: 0000:2d:00.0
        totalMemory: 23.62GiB freeMemory: 504.25MiB
        2019-06-26 10:51:34.881217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
        2019-06-26 10:51:35.251465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
        2019-06-26 10:51:35.251505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
        2019-06-26 10:51:35.251511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
        2019-06-26 10:51:35.251729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 195 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:2d:00.0, compute capability: 7.5)
        True
    
  • End your interactive job

exit

Run cellfinder

Allthough you can run cellfinder interactively, it is better to submit a batch job.

  • Write the job submission script. An example can be found here. If possible, set the output directory to local, fast scratch storage.

  • Submit the job to the job scheduler

sbatch cellfinder_sbatch.sh
  • If you use the example script, you will recieve an email when the job is done. To watch the progress, log onto a node with the same storage drive mounted and run:

watch tail -n 100 /path/to/cellfinder_log.log
  • Copy the results from the storage platform.