Subject: Discussion group for the ARIA software
List archive
Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm
Chronological Thread
- From: Benjamin Bardiaux <bardiaux AT pasteur.fr>
- To: <benedikt.soeldner AT tu-dortmund.de>
- Cc: <aria-discuss AT services.cnrs.fr>
- Subject: Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm
- Date: Mon, 3 May 2021 11:39:24 +0200
Dear Benedikt,
My bad, I forgot to mention the required "—no-test" option in the aria2 command line when using sbatch submission.
(It will disable the host checking).
Best,
Benjamin
-----------------------------------------------------
Dr Benjamin Bardiaux | Bioinformatique Structurale
bardiaux AT pasteur.fr | Institut Pasteur
25,28 rue du Docteur Roux 75015 Paris, France
-----------------------------------------------------
Dr Benjamin Bardiaux | Bioinformatique Structurale
bardiaux AT pasteur.fr | Institut Pasteur
25,28 rue du Docteur Roux 75015 Paris, France
-----------------------------------------------------
Le 3 mai 2021 à 10:43, Benedikt Soeldner <benedikt.soeldner AT tu-dortmund.de> a écrit :
Dear Benjamin,
thank you for the tip. I tried it out on friday, but unfortunately,
already at the beginning, the following error occured:
MESSAGE [Project]: Checking host list ...
WARNING [Job manager]: Command "sbatch --job-name=CNS_Aria_run_2021-04-30
--error=/.../cns_error_msg.txt --output=/.../cns_
output_msg.txt --partition=short --ntasks=1
--cpus-per-task=1" ... failed (connection failed or
unable to open temporary file.)
WARNING [Project]: At least one command could not be executed. Please check
your host list setup.
Is the reason for this problem maybe, that the check_host.csh file in the
temporary direction doesn't look like a Slurm Slurm job script (in
contrast to the files refine.csh and refine_water.csh in the temporary
directory)? If yes, does this file needs to be modified or is there
another problem?
Best regards,
BenediktDear Benedikt,If you're slurm setup allows submission from the node where you're runningaria, you could use<host enabled="yes" command="sbatch --your.sbatch.options"executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"n_cpu="20" use_absolute_path="yes"/></job_manager>this means that each CNS calculation will be submitted independently viasbatch, hence running in parallel.Best regards,BenjaminOn 30/04/2021 11:19, Benedikt Soeldner wrote:Dear Aria discussion group,since a few months, I'm running my Aria calculations on a clustermanagedwith the Slurm scheduling system. Unfortunately, I didn't find out yet,how to run Aria, i.e., the CNS calculation processes, on multiple nodesinparallel. Does anyone of you know if and how this works?At the moment, I start my Aria calculations by submitting a job script,which looks like the following:#!/bin/bash -l#SBATCH --job-name=Aria-Project_run01#SBATCH --output=Aria-Project_run01.out#SBATCH --error=Aria-Project_run01.err#SBATCH --partition=short#SBATCH --ntasks=1#SBATCH --cpus-per-task=20#SBATCH --mem=8Gmodule purgemodule load python/2.7.18cd /.../Aria-Projectsrun python -O /.../programs/aria2.3.2/aria2.py --output=run01msg.txtAria-Project_run01.xmlAnd in the Aria project file, the following lines are written:<job_manager default_command="csh -f"><host enabled="yes" command="csh -f"executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"n_cpu="20" use_absolute_path="yes"/></job_manager>When I increase the number of nodes (#SBATCH --nodes=... in the jobscript), the job still runs just on one node, as there is only oneinitialtask. I guess, the lines above from the project file also need to bemodified somehow and not only in the Slurm job script. Do you know, howthese two files should look like? Or do I need to modify other files inaddition?Thank you for your help!Best regards,Benedikt Söldner--------------------------------------Benedikt Söldner (PhD student)benedikt.soeldner AT tu-dortmund.deTechnical University Dortmund, GermanyResearch group of Prof. Dr. Rasmus Linser-----------------------------------------------Dr Benjamin Bardiaux bardiaux AT pasteur.frUnité de Bioinformatique StructuraleCNRS UMR3528 - Institut Pasteur25,28 rue du Docteur Roux 75015 Paris, France---------------------------------------------
-
Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm,
Benedikt Soeldner, 05/03/2021
-
Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm,
Benjamin Bardiaux, 05/03/2021
- Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm, Benedikt Soeldner, 05/07/2021
-
Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm,
Benjamin Bardiaux, 05/03/2021
Archive powered by MHonArc 2.6.19.