Skip to Content.
Sympa Menu

aria-discuss - Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm

Subject: Discussion group for the ARIA software

List archive

Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm


Chronological Thread  
  • From: "Benedikt Soeldner" <benedikt.soeldner AT tu-dortmund.de>
  • To: bardiaux AT pasteur.fr
  • Cc: aria-discuss AT services.cnrs.fr
  • Subject: Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm
  • Date: Mon, 3 May 2021 10:43:16 +0200
  • Importance: Normal

Dear Benjamin,

thank you for the tip. I tried it out on friday, but unfortunately,
already at the beginning, the following error occured:

MESSAGE [Project]: Checking host list ...
WARNING [Job manager]: Command "sbatch --job-name=CNS_Aria_run_2021-04-30
--error=/.../cns_error_msg.txt --output=/.../cns_
output_msg.txt --partition=short --ntasks=1
--cpus-per-task=1" ... failed (connection failed or
unable to open temporary file.)
WARNING [Project]: At least one command could not be executed. Please check
your host list setup.

Is the reason for this problem maybe, that the check_host.csh file in the
temporary direction doesn't look like a Slurm Slurm job script (in
contrast to the files refine.csh and refine_water.csh in the temporary
directory)? If yes, does this file needs to be modified or is there
another problem?

Best regards,

Benedikt



> Dear Benedikt,
>
> If you're slurm setup allows submission from the node where you're running
> aria, you could use
>
> <host enabled="yes" command="sbatch --your.sbatch.options"
> executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"
> n_cpu="20" use_absolute_path="yes"/>
> </job_manager>
>
>
> this means that each CNS calculation will be submitted independently via
> sbatch, hence running in parallel.
>
> Best regards,
>
> Benjamin
>
>
> On 30/04/2021 11:19, Benedikt Soeldner wrote:
>> Dear Aria discussion group,
>>
>> since a few months, I'm running my Aria calculations on a cluster
>> managed
>> with the Slurm scheduling system. Unfortunately, I didn't find out yet,
>> how to run Aria, i.e., the CNS calculation processes, on multiple nodes
>> in
>> parallel. Does anyone of you know if and how this works?
>>
>> At the moment, I start my Aria calculations by submitting a job script,
>> which looks like the following:
>>
>> #!/bin/bash -l
>> #SBATCH --job-name=Aria-Project_run01
>> #SBATCH --output=Aria-Project_run01.out
>> #SBATCH --error=Aria-Project_run01.err
>> #SBATCH --partition=short
>> #SBATCH --ntasks=1
>> #SBATCH --cpus-per-task=20
>> #SBATCH --mem=8G
>> module purge
>> module load python/2.7.18
>> cd /.../Aria-Project
>> srun python -O /.../programs/aria2.3.2/aria2.py --output=run01msg.txt
>> Aria-Project_run01.xml
>>
>> And in the Aria project file, the following lines are written:
>>
>> <job_manager default_command="csh -f">
>> <host enabled="yes" command="csh -f"
>> executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"
>> n_cpu="20" use_absolute_path="yes"/>
>> </job_manager>
>>
>> When I increase the number of nodes (#SBATCH --nodes=... in the job
>> script), the job still runs just on one node, as there is only one
>> initial
>> task. I guess, the lines above from the project file also need to be
>> modified somehow and not only in the Slurm job script. Do you know, how
>> these two files should look like? Or do I need to modify other files in
>> addition?
>>
>> Thank you for your help!
>>
>> Best regards,
>>
>> Benedikt Söldner
>>
>> --------------------------------------
>> Benedikt Söldner (PhD student)
>> benedikt.soeldner AT tu-dortmund.de
>> Technical University Dortmund, Germany
>> Research group of Prof. Dr. Rasmus Linser
>>
>>
>>
>>
>>
>>
>
>
> --
> ---------------------------------------------
> Dr Benjamin Bardiaux bardiaux AT pasteur.fr
> Unité de Bioinformatique Structurale
> CNRS UMR3528 - Institut Pasteur
> 25,28 rue du Docteur Roux 75015 Paris, France
> ---------------------------------------------
>





Archive powered by MHonArc 2.6.19.

Top of Page