Skip to Content.
Sympa Menu

aria-discuss - Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm

Subject: Discussion group for the ARIA software

List archive

Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm


Chronological Thread  
  • From: "Benedikt Soeldner" <benedikt.soeldner AT tu-dortmund.de>
  • To: "Benjamin Bardiaux" <bardiaux AT pasteur.fr>
  • Cc: aria-discuss AT services.cnrs.fr
  • Subject: Re: [aria-discuss] Running Aria on multiple nodes on a cluster managed by Slurm
  • Date: Fri, 7 May 2021 14:43:05 +0200
  • Importance: Normal

Dear Benjamin,

it worked! Thank you very much for your help, this accelerates my Aria
calculations a lot.

Best,

Benedikt Söldner



> Dear Benedikt,
>
> My bad, I forgot to mention the required "?no-test" option in the aria2
> command line when using sbatch submission.
> (It will disable the host checking).
>
> Best,
>
> Benjamin
>
> -----------------------------------------------------
> Dr Benjamin Bardiaux | Bioinformatique Structurale
> bardiaux AT pasteur.fr | Institut Pasteur
> 25,28 rue du Docteur Roux 75015 Paris, France
> -----------------------------------------------------
>
>> Le 3 mai 2021 à 10:43, Benedikt Soeldner
>> <benedikt.soeldner AT tu-dortmund.de> a écrit :
>>
>> ?Dear Benjamin,
>>
>> thank you for the tip. I tried it out on friday, but unfortunately,
>> already at the beginning, the following error occured:
>>
>> MESSAGE [Project]: Checking host list ...
>> WARNING [Job manager]: Command "sbatch
>> --job-name=CNS_Aria_run_2021-04-30
>> --error=/.../cns_error_msg.txt --output=/.../cns_
>> output_msg.txt --partition=short --ntasks=1
>> --cpus-per-task=1" ... failed (connection failed
>> or
>> unable to open temporary file.)
>> WARNING [Project]: At least one command could not be executed. Please
>> check
>> your host list setup.
>>
>> Is the reason for this problem maybe, that the check_host.csh file in
>> the
>> temporary direction doesn't look like a Slurm Slurm job script (in
>> contrast to the files refine.csh and refine_water.csh in the temporary
>> directory)? If yes, does this file needs to be modified or is there
>> another problem?
>>
>> Best regards,
>>
>> Benedikt
>>
>>
>>
>>> Dear Benedikt,
>>>
>>> If you're slurm setup allows submission from the node where you're
>>> running
>>> aria, you could use
>>>
>>> <host enabled="yes" command="sbatch --your.sbatch.options"
>>> executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"
>>> n_cpu="20" use_absolute_path="yes"/>
>>> </job_manager>
>>>
>>>
>>> this means that each CNS calculation will be submitted independently
>>> via
>>> sbatch, hence running in parallel.
>>>
>>> Best regards,
>>>
>>> Benjamin
>>>
>>>
>>>> On 30/04/2021 11:19, Benedikt Soeldner wrote:
>>>> Dear Aria discussion group,
>>>>
>>>> since a few months, I'm running my Aria calculations on a cluster
>>>> managed
>>>> with the Slurm scheduling system. Unfortunately, I didn't find out
>>>> yet,
>>>> how to run Aria, i.e., the CNS calculation processes, on multiple
>>>> nodes
>>>> in
>>>> parallel. Does anyone of you know if and how this works?
>>>>
>>>> At the moment, I start my Aria calculations by submitting a job
>>>> script,
>>>> which looks like the following:
>>>>
>>>> #!/bin/bash -l
>>>> #SBATCH --job-name=Aria-Project_run01
>>>> #SBATCH --output=Aria-Project_run01.out
>>>> #SBATCH --error=Aria-Project_run01.err
>>>> #SBATCH --partition=short
>>>> #SBATCH --ntasks=1
>>>> #SBATCH --cpus-per-task=20
>>>> #SBATCH --mem=8G
>>>> module purge
>>>> module load python/2.7.18
>>>> cd /.../Aria-Project
>>>> srun python -O /.../programs/aria2.3.2/aria2.py --output=run01msg.txt
>>>> Aria-Project_run01.xml
>>>>
>>>> And in the Aria project file, the following lines are written:
>>>>
>>>> <job_manager default_command="csh -f">
>>>> <host enabled="yes" command="csh -f"
>>>> executable="/.../programs/cns_solve_1.21/intel-x86_64bit-linux/bin/cns_solve"
>>>> n_cpu="20" use_absolute_path="yes"/>
>>>> </job_manager>
>>>>
>>>> When I increase the number of nodes (#SBATCH --nodes=... in the job
>>>> script), the job still runs just on one node, as there is only one
>>>> initial
>>>> task. I guess, the lines above from the project file also need to be
>>>> modified somehow and not only in the Slurm job script. Do you know,
>>>> how
>>>> these two files should look like? Or do I need to modify other files
>>>> in
>>>> addition?
>>>>
>>>> Thank you for your help!
>>>>
>>>> Best regards,
>>>>
>>>> Benedikt Söldner
>>>>
>>>> --------------------------------------
>>>> Benedikt Söldner (PhD student)
>>>> benedikt.soeldner AT tu-dortmund.de
>>>> Technical University Dortmund, Germany
>>>> Research group of Prof. Dr. Rasmus Linser
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ---------------------------------------------
>>> Dr Benjamin Bardiaux bardiaux AT pasteur.fr
>>> Unité de Bioinformatique Structurale
>>> CNRS UMR3528 - Institut Pasteur
>>> 25,28 rue du Docteur Roux 75015 Paris, France
>>> ---------------------------------------------
>>>
>>
>>
>





Archive powered by MHonArc 2.6.19.

Top of Page