Distributed data analysis with Matlab

Note

If you are missing the context of this exercise, please refer to Exercise: distributed data analysis.

Preparation

Using the commands below to download the exercise package and check its content.

$ cd
$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/exercise_da/torque_exercise.tgz
$ tar xvzf torque_exercise.tgz
$ cd torque_exercise
$ ls
subject_0  subject_1  subject_2  subject_3  subject_4  subject_5 ...

In the package, there are folders for subject data (i.e. subject_{0..5}). In each subject folder, there is a data file containing an encrypted string (URL) pointing to the subject’s photo on the Internet.

In this fake analysis, we are going to find out who our subjects are, using an trivial “analysis algorithm” that does the following two steps in each subject folder:

  1. decrypting the URL string, and

  2. downloading the subject’s photo.

Before you start, get into the directory of the torque_exercise and run

$ ./clean.sh

to remove previously produced results.

Optionally, read the Matlab program run_analysis.m and try to get an idea how to use it. Don’t spend too much time in understanding every detail.

Tip

The script consists of a Matlab function run_analysis encapsulating the data-analysis algorithm. The function takes one input argument, the subject id.

Task 1: Using matlab_sub

  1. check the Matlab wrapper script run_subject_0.m.

    Note

    Since the matlab_sub command doesn’t take any input argument, we cannot submit the run_analysis.m directly with matlab_sub.

    A workaround is to provide another script in which the function defined in run_analysis.m is called with a hard-coded argument; therefore the run_subject_0.m script.

  2. submit a job via matlab_sub to run the run_subject_0.m with Matlab.

    $ matlab_sub ./run_subject_0.m
    

    and follow the instructions in the terminal to provide required walltime and memory. At the end, a job will be submitted.

    Wait for the job to finish, and check if you get the output photo in the subject_0 directory.

    $ ls -l subject_0/photo.jpg
    
  3. clean the data

    $ ./clean.sh
    

Task 2: Using qsubcellfun

  1. In your VNC session, start the Matlab desktop GUI like the commands below:

    $ cd ~/torque_exercise
    $ matlab
    

    In the popup dialog, you could specify the amount of resources you need. In this exercise, we simply click through it. It will submit an interactive job to run the Matlab desktop on a compute node. Wait for the Matlab desktop to show up in your VNC session.

    Note

    From this point on, we will work within the Matlab desktop GUI. All commands in the following steps should be done within the command window of Matlab.

  2. Change the current workding directory to the torque_exercise directory in the command window:

    >> cd ~/torque_exercise
    >> ls
    clean.sh  run_analysis.m  subject_0     subject_1  subject_2  subject_3  subject_4  subject_5 ...
    
  3. Load the qsub toolbox of fieldtrip

    >> addpath '/home/common/matlab/fieldtrip/qsub'
    
  4. Test run over the 6 subjects sequentically with the Matlab’s cellfun function

    >> out = {}
    >> ids = num2cell(0:5)
    >> out = cellfun(@run_analysis, ids, 'UniformOutput', false)
    

    Note

    Since the the run_analysis function returns the path of the subject’s photo, the variable out will be an array of 6 paths, each for a subject’s photo.

    After the prompt is returned successfully, you should see the photos of all 6 subjects. Let’s list all of them based on the returned out

    >> for o = out
    system(sprintf("ls -l %s", o{1}));
    end
    

    Clean up the output

    >> system('./clean.sh')
    
  5. Run over the 6 subjects in parallel with the qsubcellfun function

    >> out = {}
    >> ids = num2cell(0:5)
    >> out = qsubcellfun(@run_analysis, ids, 'memreq', 1024^3, 'timreq', 300, 'stack', 1)
    

    In this case, 6 jobs are submitted, each runs the analysis on a subject. The resource requirements of each job are 300 secs walltime and 1 GB memory.

    The prompt will return only if all jobs are all finished.

    List the output files based on the returned out variable:

    >> for o = out
    system(sprintf("ls -l %s", o{1}));
    end
    

    Clean up the output:

    >> system('./clean.sh')
    

Task 3: Using qsubfeval and qsubget

Instead of using qsubcellfun shown in Task 1. We could also use the combination of qsubfeval and qsubget. This approach puts submitted jobs in the background without blocking the command window.

  1. Make sure you are in the right working directory, and having the qsubtoolbox loaded in Matlab:

    >> cd ~/torque_exercise
    >> addpath '/home/common/matlab/fieldtrip/qsub/'
    
  2. Submit jobs to run the 6 subjects in parallel using qsubfeval

    >> jobs = {};
    >> for id = 0:5
    jobs{id+1} = qsubfeval(@run_analysis, id,  'memreq',  1024^3,  'timreq',  300);
    end
    
    >> % save recorded job identifiers to file %
    >> save 'jobs.mat' jobs
    
  3. Check job status

    >> % load job identifiers from file %
    >> load 'jobs.mat'
    >> for j = jobs
    jid = qsublist('getpbsid', j);
    system(sprintf('qstat %s', jid));
    end
    

    Tip

    The idea of the for loop above is to find the Torque job id from the job identifier returned from qsubfeval using the qsublist function. With the Torque job id, the job status is retrieved by making a qstat system call.

    Since checking over jobs is a regular task, there is a small function called check_jobs.m in the exercise package. Instead of typing the foor-loop everytime, you could also call:

    >> check_jobs(jobs);
    

    Repeat the for-loop of checking job status until all jobs are reported completed (i.e. in status C).

  4. Check job output using qsubget

    >> out = {};
    >> for j = jobs
    out = [out, qsubget(j{:})];
    end
    
    >> out
    

    Now we have extracted the out from the submitted jobs. Let’s list the output files based on it to see if we get the subject’s photos:

    >> for o = out
    system(sprintf("ls -l %s", o{1}));
    end
    

    Clean up the output

    >> system('./clean.sh')