Exercise: simple batch job

The aim of this exercise is to get you familiar with the torque client tools for submitting and managing cluster jobs. We will firstly create a script that calls the sleep command for a given period of time. After that, we are going to submit the script as jobs to the cluster.

Tasks

Note

DO NOT just copy-n-paste the commands for the hands-on exercises!! Typing (and eventually making typos) is an essential part of the learning process.

  1. make a script called run_sleep.sh with the following content:

    #!/bin/bash
    
    my_host=$( /bin/hostname )
    
    time=$( date )
    echo "$time: $my_host falls asleep ..."
    
    sleep $1
    
    time=$( date )
    echo "$time: $my_host wakes up."
    

    Note

    Input argument of a bash script is accessible via variable $n where n is an integer referring to the n-th variable given the the script. In the script above, the value $1 on the line sleep $1 refers to the first argument given the the script. For instance, if you run the script as run_sleep.sh 10, the value of $1 is 10.

  2. make sure the script runs locally

    $ chmod +x run_sleep.sh
    $ ./run_sleep.sh 1
    Mon Sep 28 16:36:28 CEST 2015: dccn-c007.dccn.nl falls asleep ...
    Mon Sep 28 16:36:29 CEST 2015: dccn-c007.dccn.nl wakes up.
    
  3. submit a job to run the script

    $ echo "$PWD/run_sleep.sh 60" | qsub -N 'sleep_1m' -l 'nodes=1:ppn=1,mem=10mb,walltime=00:01:30'
    6928945.dccn-l029.dccn.nl
    
  4. check the job status. For example,

    $ qstat 6928945
    

    Note

    The torque job id given here should be replaced accordingly.

  5. or monitor it until it is complete

    $ watch qstat 6928945
    

    Tip

    The watch command is used here to repeat the qstat command every 2 seconds. Press Control-c to quit the watch program when the job is finished.

  6. examine the output file, e.g. sleep_10.o6928945, and find out the resource consumption of this job

    $ cat sleep_1m.o6928945 | grep 'Used resources'
    Used resources:    cput=00:00:00,mem=4288kb,vmem=433992kb,walltime=00:01:00
    
  7. submit another job to run the script, with longer duration of sleep. For example,

    $ echo "$PWD/run_sleep.sh 3600" | qsub -N 'sleep_1h' -l 'nodes=1:ppn=1,mem=10mb,walltime=01:10:00'
    6928946.dccn-l029.dccn.nl
    

    Note

    Try to compare the command in step 3. As we expect the job to run longer, the requirement on the job walltime is also extended to 1 hour 10 minutes.

  8. Ok, we don’t want to wait for the 1-hour job to finish. Let’s cancel the job. For example,

    $ qdel 6928946