Tutorial for using a computer cluster
Using a computer cluster to run some program is useful if your program needs:
- a large amount of CPU or memory resources,
- to run during a large amount of time.
In those cases, you can follow this short tutorial, designed for the ICJ cluster.
Step one: find an available cluster on https://cluster-math.univ-lyon1.fr/.
By clicking on the name of a cluster, you can get more information about the memory and the CPU speed.
You also find how the memory and the CPU are used (for the memory, white and green colours correspond to available memory). For example, we get that about 3GB of memory and 50% of the CPU are used for the cluster 10.
Step two: connect yourself to the cluster. Open a terminal and type (where X is the chosen cluster number):
ssh clusterX-math.univ-lyon1.fr
You then have to confirm the access and then enter your password. If you are outside the university, you can use this command:
ssh username@clusterX-math.univ-lyon1.fr
Step three: browse directories to find your program, using cd
, for example:
cd ~/Desktop/myprogram/
Step four: launch your program. First, you can test that your program runs correctly using:
./myprogram.exe
If you want to stop your program before the end, you can break it with Ctrl+C
.
Then, if your program only take a few minutes, you can run it and write the output in output.txt.
./myprogram.exe >output.txt
On the contrary, if your program take hours or days, you should use the nohup
command to allow your program to run only on the cluster.
nohup ./myprogram.exe &
With nohup
, you can disconnect yourself from the cluster (typing exit
) and even shutdown your computer.
Step five: verify that your program does not use too much memory and CPU. You can check this on https://cluster-math.univ-lyon1.fr/. If a problem occurs, you can “kill” your program. Reconnect on the cluster and then type:
top
to show all launched processes on the cluster. Type u
, your username and then Enter
to find which program you’re actually running. Then press q
to go out.
Find the PID of the problematic program (for example 23927) and kill it with
kill 23927
If you want to kill all your programs on the cluster, type
killall -u username
Related posts:
- More about ssh on tuteurs.ens.fr (in French).
- For ICJ users, you can also see this and this for Matlab users (in French).