Distributed data analysis with Matlab
Note
If you are missing the context of this exercise, please refer to Exercise: distributed data analysis.
Preparation
Using the commands below to download the exercise package and check its content.
$ cd
$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/exercise_da/hpc_exercise_slurm.tgz
$ tar xvzf hpc_exercise_slurm.tgz
$ cd hpc_exercise_slurm
$ ls
subject_0 subject_1 subject_2 subject_3 subject_4 subject_5 ...
In the package, there are folders for subject data (i.e. subject_{0..5}). In each subject folder, there is a data file containing an encrypted string (URL) pointing to the subject’s photo on the Internet.
In this fake analysis, we are going to find out who our subjects are, using an trivial “analysis algorithm” that does the following two steps in each subject folder:
decrypting the URL string, and
downloading the subject’s photo.
Before you start, get into the directory of the hpc_exercise_slurm and run
$ ./clean.sh
to remove previously produced results.
Optionally, read the Matlab program run_analysis.m and try to get an idea how to use it. Don’t spend too much time in understanding every detail.
Tip
The script consists of a Matlab function run_analysis encapsulating the data-analysis algorithm. The function takes one input argument, the subject id.
Task 1: Using matlab_sub
check the Matlab wrapper script
run_subject_0.m.Note
Since the
matlab_subcommand doesn’t take any input argument, we cannot submit therun_analysis.mdirectly withmatlab_sub.A workaround is to provide another script in which the function defined in
run_analysis.mis called with a hard-coded argument; therefore therun_subject_0.mscript.submit a job via
matlab_subto run therun_subject_0.mwith Matlab.$ matlab_sub ./run_subject_0.m
and follow the instructions in the terminal to provide required walltime and memory. At the end, a job will be submitted.
Wait for the job to finish, and check if you get the output photo in the
subject_0directory.$ ls -l subject_0/photo.jpg
clean the data
$ ./clean.sh
Task 2: Using qsubcellfun
In your VNC session, start the Matlab desktop GUI like the commands below:
$ cd ~/hpc_exercis_slurm $ matlab
In the popup dialog, you could specify the amount of resources you need. In this exercise, we simply click through it. It will submit an interactive job to run the Matlab desktop on a compute node. Wait for the Matlab desktop to show up in your VNC session.
Note
From this point on, we will work within the Matlab desktop GUI. All commands in the following steps should be done within the command window of Matlab.
Change the current workding directory to the
hpc_exercis_slurmdirectory in the command window:>> cd ~/hpc_exercis_slurm >> ls clean.sh run_analysis.m subject_0 subject_1 subject_2 subject_3 subject_4 subject_5 ...
Load the qsub toolbox of fieldtrip
>> addpath '/home/common/matlab/fieldtrip/qsub'
Test run over the 6 subjects sequentically with the Matlab’s
cellfunfunction>> out = {} >> ids = num2cell(0:5) >> out = cellfun(@run_analysis, ids, 'UniformOutput', false)
Note
Since the the
run_analysisfunction returns the path of the subject’s photo, the variableoutwill be an array of 6 paths, each for a subject’s photo.After the prompt is returned successfully, you should see the photos of all 6 subjects. Let’s list all of them based on the returned
out>> for o = out system(sprintf("ls -l %s", o{1})); end
Clean up the output
>> system('./clean.sh')
Run over the 6 subjects in parallel with the
qsubcellfunfunction>> out = {} >> ids = num2cell(0:5) >> out = qsubcellfun(@run_analysis, ids, 'memreq', 1024^3, 'timreq', 300, 'stack', 1)
In this case, 6 jobs are submitted, each runs the analysis on a subject. The resource requirements of each job are 300 secs walltime and 1 GB memory.
The prompt will return only if all jobs are all finished.
List the output files based on the returned
outvariable:>> for o = out system(sprintf("ls -l %s", o{1})); end
Clean up the output:
>> system('./clean.sh')
Task 3: Using qsubfeval and qsubget
Instead of using qsubcellfun shown in Task 1. We could also use the combination of qsubfeval and qsubget. This approach puts submitted jobs in the background without blocking the command window.
Make sure you are in the right working directory, and having the qsubtoolbox loaded in Matlab:
>> cd ~/hpc_exercis_slurm >> addpath '/home/common/matlab/fieldtrip/qsub/'
Submit jobs to run the 6 subjects in parallel using
qsubfeval>> jobs = {}; >> for id = 0:5 jobs{id+1} = qsubfeval(@run_analysis, id, 'memreq', 1024^3, 'timreq', 300); end >> % save recorded job identifiers to file % >> save 'jobs.mat' jobs
Check job status
>> % load job identifiers from file % >> load 'jobs.mat' >> for j = jobs jid = qsublist('getpbsid', j); system(sprintf('scontrol show job %s', jid)); end
Tip
The idea of the for loop above is to find the Slurm job id from the job identifier returned from
qsubfevalusing theqsublistfunction. With the Slurm job id, the job detail is retrieved by making ascontrol show jobsystem call.Since checking over jobs is a regular task, there is a small function called
check_jobs.min the exercise package. Instead of typing the foor-loop everytime, you could also call:>> check_jobs(jobs);
Repeat the for-loop of checking job status until all jobs are reported completed (i.e. in status
C).Check job output using
qsubget>> out = {}; >> for j = jobs out = [out, qsubget(j{:})]; end >> out
Now we have extracted the
outfrom the submitted jobs. Let’s list the output files based on it to see if we get the subject’s photos:>> for o = out system(sprintf("ls -l %s", o{1})); end
Clean up the output
>> system('./clean.sh')