Skip to Content.
Sympa Menu

aria-discuss - Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm

Subject: Discussion group for the ARIA software

List archive

Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm


Chronological Thread  
  • From: Benjamin Bardiaux <bardiaux AT pasteur.fr>
  • To: <benedikt.soeldner AT tu-dortmund.de>
  • Cc: <aria-discuss AT services.cnrs.fr>
  • Subject: Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm
  • Date: Mon, 3 May 2021 11:39:24 +0200

Dear Benedikt,

My bad, I forgot to mention the required "—no-test" option in the aria2 command line when using sbatch submission.
(It will disable the host checking).

Best,

Benjamin 

-----------------------------------------------------
Dr Benjamin Bardiaux    | Bioinformatique Structurale
bardiaux AT pasteur.fr     | Institut Pasteur
25,28 rue du Docteur Roux 75015 Paris, France
-----------------------------------------------------

Le 3 mai 2021 à 10:43, Benedikt Soeldner <benedikt.soeldner AT tu-dortmund.de> a écrit :

Dear Benjamin,

thank you for the tip. I tried it out on friday, but unfortunately,
already at the beginning, the following error occured:

MESSAGE [Project]: Checking host list ...
WARNING [Job manager]: Command "sbatch --job-name=CNS_Aria_run_2021-04-30
                      --error=/.../cns_error_msg.txt --output=/.../cns_
                      output_msg.txt --partition=short --ntasks=1
                      --cpus-per-task=1" ... failed (connection failed or
                      unable to open temporary file.)
WARNING [Project]: At least one command could not be executed. Please check
                  your host list setup.

Is the reason for this problem maybe, that the check_host.csh file in the
temporary direction doesn't look like a Slurm Slurm job script (in
contrast to the files refine.csh and refine_water.csh in the temporary
directory)? If yes, does this file needs to be modified or is there
another problem?

Best regards,

Benedikt



Dear Benedikt,

If you're slurm setup allows submission from the node where you're running
aria, you could use

      <host enabled="yes" command="sbatch --your.sbatch.options"
executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"
n_cpu="20" use_absolute_path="yes"/>
    </job_manager>


this means that each CNS calculation will be submitted independently via
sbatch, hence running in parallel.

Best regards,

Benjamin


On 30/04/2021 11:19, Benedikt Soeldner wrote:
Dear Aria discussion group,

since a few months, I'm running my Aria calculations on a cluster
managed
with the Slurm scheduling system. Unfortunately, I didn't find out yet,
how to run Aria, i.e., the CNS calculation processes, on multiple nodes
in
parallel. Does anyone of you know if and how this works?

At the moment, I start my Aria calculations by submitting a job script,
which looks like the following:

#!/bin/bash -l
#SBATCH --job-name=Aria-Project_run01
#SBATCH --output=Aria-Project_run01.out
#SBATCH --error=Aria-Project_run01.err
#SBATCH --partition=short
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20
#SBATCH --mem=8G
module purge
module load python/2.7.18
cd /.../Aria-Project
srun python -O /.../programs/aria2.3.2/aria2.py --output=run01msg.txt
Aria-Project_run01.xml

And in the Aria project file, the following lines are written:

    <job_manager default_command="csh -f">
      <host enabled="yes" command="csh -f"
executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"
n_cpu="20" use_absolute_path="yes"/>
    </job_manager>

When I increase the number of nodes (#SBATCH --nodes=... in the job
script), the job still runs just on one node, as there is only one
initial
task. I guess, the lines above from the project file also need to be
modified somehow and not only in the Slurm job script. Do you know, how
these two files should look like? Or do I need to modify other files in
addition?

Thank you for your help!

Best regards,

Benedikt Söldner

--------------------------------------
Benedikt Söldner (PhD student)
benedikt.soeldner AT tu-dortmund.de
Technical University Dortmund, Germany
Research group of Prof. Dr. Rasmus Linser








--
---------------------------------------------
Dr Benjamin Bardiaux      bardiaux AT pasteur.fr
Unité de Bioinformatique Structurale
CNRS UMR3528 - Institut Pasteur
25,28 rue du Docteur Roux 75015 Paris, France
---------------------------------------------






Archive powered by MHonArc 2.6.19.

Top of Page