The SLURM scheduler (Simple Linux Utility for Resource Management) manages and allocates all of Sol's compute nodes. All of your computing must be done on Sol's compute nodes. The following is an abbreviated user guide for SLURM. Please visit the SLURM website for a more detailed documentation of tools and capabilities.
SLURM uses the term partition instead of queue. There are several partitions available on Sol for running jobs:
- lts : 20-core nodes purchased as part of the original cluster by LTS.
- Two 2.3GHz 10-core Intel Xeon E5-2650 v3, 25M Cache, 128GB 2133MHz RAM
- lts-gpu: 1 core per lts node is reserved for launching gpu jobs
- im1080 : 24-core nodes purchased by Wonpil Im, Department of Biological Sciences. Users can request a max of 20 cores per node.
- im1080-gpu : 2 cores per im1080 node is reserved for launching gpu jobs.
- Two 2.3GHz 12-core Intel Xeon E5-2670 v3, 30M Cache, 128GB 2133MHz RAM, Two EVGA Geforce GTX 1080 PCIE 8GB GDDR5
- eng : 24-core nodes purchased by various RCEAS faculty.
- eng-gpu : 2 cores per eng node is reserved for launching gpu jobs i.e. 1 core for each gpu.
- Two 2.3GHz 12-core Intel Xeon E5-2670 v3, 30M Cache, 128GB 2133MHz RAM, EVGA Geforce GTX 1080 PCIE 8GB GDDR5. Four nodes have two cards while other nodes have one card
- engc : 24-core nodes based on Broadwell CPUs purchased by ChemE Faculty. Users can request a max of 24 cores per node until GPUs are added to these nodes.
- Two 2.2GHz 12-core Intel Xeon E5-2650 v4, 30M Cache, 64GB 2133MHz RAM
- himem : 16-core node purchased by Economics Faculty with 512GB RAM.
- Two 2.6GHz 8-core Intel Xeon E5-2640 v3, 20M Cache, 512GB 2400MHz RAM
- Users utilizing this node will be charged a higher rate of SU consumption ( 3 SU/core hour). Please evaluate memory consumption of your job before submitting jobs to this partition. If you need to use this partition, please contact Alex Pacheco.
- enge,engi: 36-core node purchased by MEM faculty and ISE Department
- Two 2.3GHz 18-core Intel Xeon Gold 6140, 24.75M Cache, 192GB 2666MHz RAM
- This node features the newer AVX512 vector extension that provides twice the FLOPS of earlier generation Haswell/Broadwell CPUs at the expense of CPU speed.
- im2080: 36-core nodes purchased by Wonpil Im, Department of Biological Sciences. Users can request a max of 28 cores per node.
- im2080-gpu : 8 cores per im2080 node is reserved for launching gpu jobs i.e. 2 cores per gpu
- Two 2.3GHz 18-core Intel Xeon Gold 6140, 24.75M Cache, 192GB 2666MHz RAM, Four ASUS GeForce RTX 2080TI PCIE 11GB GDDR6
- chem: 36-core Sklyake (2) and Cascade Lake (4) nodess purchased by Lisa Fredin, Department of Chemistry
- (2) Two 2.3GHz 18-core Intel Xeon Gold 6140, 24.75M Cache, 192GB 2666MHz RAM
- (4) Two 2.6GHz 18-core Intel Xeon Gold 6240, 24.75M Cache, 192GB 2933MHz RAM
Max Wallclock in hours
Min/Max Cores/Node per Job
Max SUs/Node consumed per hour
Max memory in GB per core
The himem partition is for running high memory jobs i.e. those requiring more than 6GB/core or for using the Artelys Knitro software. Do not submit jobs to the himem partition for running jobs that require lower memory per core. All jobs in the himem partition are charged 3 SUs per core hour of computing irrespective of how many cores or memory you consume.
To ensure investors receive their allocation of resources while still maintaining a shared resources, each investor receives a priority boost on his/her investment. Every investor hotel or condo receives a base priority of 1 on all partitions. A priority boost of 100 is provided to investors and their collaborators on their investment. This ensures that an investors job will always start before other users. Jobs accumulate a priority of 1 for each day in the queue. A non investors job in a different partition would have to be in queue for 100 days before it can have a higher priority than an investors job. Below is a table listing the various investors and the partitions where they have priority. All Hotel investors get priority access on the lts partition.
|Wonpil Im||im1080, im1080-gpu, im2080,im2080-gpu|
|Edmund Webb III||eng|
|Industrial and Systems Engineering||engi|
Current status of partitions and load on nodes is updated every 15 mins. Do not bookmark for off campus use, accessible on campus and VPN.
Usage reports for current and past allocation cycles. Do not bookmark for off campus use, accessible on campus and VPN.
- Last 2 weeks
- Current Month
- Previous Month
- Allocation Year 2019-20 Report
- Allocation Year 2018-19 Report
- Allocation Year 2017-18 Report
- Allocation Year 2016-17 Report
Detailed Annual Reports with consumption of resources by users and research groups. Do not bookmark for off campus use, accessible on campus and VPN. Some pages may take a while a load due to amount of data reported.
- Allocation Year 2019-20 Report
- Allocation Year 2018-19 Report
- Allocation Year 2017-18 Report
- Allocation Year 2016-17 Report
There are three distinct file spaces on Sol.
- HOME, your home directory on Sol
- SCRATCH, scratch storage on the local disk associated with your running job.
- CEPHFS, global parallel scratch for running jobs with a lifetime of 7 days.
- CEPH, Ceph project space for research groups that have purchased a minimum 1TB Ceph project
All Sol users are provided with a 150GB storage quota at /home/username and accessible using the environmental variable $HOME. Home storage is a large Ceph project that is not backed up. It is the users responsibility to maintain backups of their data in $HOME. $HOME directories are not deleted as long as annual user account fees are paid by the HPC PIs.
SCRATCH provides a 500GB storage on the local disk on the nodes associated with running jobs. This space is not backed up or snapshotted and is deleted when jobs are completed. A user can access this space while running jobs at /scratch/$SLURM_JOB_USER/$SLURM_JOB_ID. Since compute nodes are shared among different users, the available disk space could be less than 500GB. Users who use the SCRATCH space need to make sure that data is copied back at the end of their jobs. Since the scheduler purges the SCRATCH storage at the end of a job, data that hasn't been copied cannot be recovered. See below for a sample script using SCRATCH storage.
CEPHFS global parallel scratch
CEPHFS provides a 11TB global parallel scratch storage. This space is not backed up or snapshotted and all files older than 7 days are deleted. A user can access this space at /share/ceph/scratch/$USER/$SLURM_JOB_ID for running jobs and for 7 days after the job has completed. The SLURM scheduler automatically creates this directory. Users can use this space for writing parallel job output that needs a longer lifetime than that provided by SCRATCH. Since this storage is serviced by SSDs on the Ceph storage cluster, using CEPHFS provides better read/write performance than HOME and CEPH storage spaces. It is the users responsibility to backup data within 7 days of your job completing.
Lehigh Research Computing provides Ceph projects for research groups that require more storage than the 150GB provided to each HPC account. HPC PIs can add their collaborators to their Ceph project that can be used a storage space located at /share/ceph/projectname on Sol. Users should keep in mind that all Ceph projects including $HOME is a networked file system and writing job output to these filesystem could affect the performance of your jobs. Ceph projects should be used for storage and all workloads that contain intense Input and Output should use the SCRATCH or CEPHFS global scratch storage.
Running Jobs on Sol
You must be allocated at least one Sol compute node by SLURM to run jobs. Running compute intensive workload (i.e. anything other than editing files, submitting and monitoring jobs) on the head/login node is strictly prohibited. Users will need to write a script requesting desired resources from SLURM.
Migrating from PBS to SLURM
The following is a comparison between PBS and SLURM commands to aid users in migrating their submit scripts used on Corona to Sol
Wall Time Limit
-o file name
-e file name
Combine stdout & stderr
-j oe (both to stdout)
Default directive if --error is not specified
--requeue or --no-rqueue
qsub script filename
sbatch script filename
Job Status by id
Job Status by user
qstat -u username
squeue -u username
scontrol hold jobid
scontrol release jobid
sinfo --Node or scontrol show nodes
Number of Processors
Number of Nodes
Number of Processors per Node
There are two types of job that can be run on Sol
- Interactive Jobs
- Batch Jobs
These are jobs that provide an interactive environment or command line prompt on which users can enter commands to run simulations. These are best when used for testing and debugging and are not appropriate for long running production jobs. Resources can be requested using the srun command with at least one option to launch a pseudo terminal --pty /bin/bash. Other options include partition, number of nodes, tasks per node and time
When a resource becomes available, SLURM will provide you with a command prompt on the compute node you are allocated. Until resource is available, you will have no access to use the command prompt on the shell where the above command is executed. If you cancel the command using CNTRL-C, your interactive job request will be cancelled. Depending on how busy the cluster is, your wait could be a few minutes to a few days.
All compute nodes have a naming convention sol-[a-e][1-6][00-18], for e.g. sol-a104. Do not run jobs on the head/login node i.e. sol.
These are jobs that require writing a series of command in a shell script that SLURM will execute on the compute node. Resources can be requested in the script or as options to the command, sbatch, while submitting the script to the SLURM scheduler.
Sample Scripts for Batch Jobs
Command line options to sbatch override #SBATCH commands in the submit script.
Submitting Dependency jobs
You want to run a long simulation that is split into multiple sequential runs to fit within the maximum walltimes of the partitions. One common method is to create job submission script for each of the sequential steps that will be submitted by the previous job or submitted manually when the previous job is complete. The former method is not recommended since some systems do not allow job submission from the compute nodes (you might encounter the same issues on national resources as very few systems have queue walltimes larger than 7 days) or if you run out of walltime, then the subsequent job may not be submitted. In the latter method, you lose valuable time if you are not monitoring your jobs and are not available to submit the subsequent job.
The recommended method is to submit jobs with a dependency attribute for the second and subsequent jobs. On Sol and any system that uses the SLURM job scheduler, dependency jobs are created by adding the --dependency=... flag to the sbatch command.
Here, you are submitting a SLURM script <Submit Script> that depends on a previous job with ID <JobID>. Options that can be added to the dependency argument are
- afterok:<JobID> Job will be scheduled to run only if Job <JobID> had completed with no errors
- afternotok:<JobID> Job will be scheduled to run only if Job <JobID> has completed with errors
- afterany:<JobID> Job will be scheduled to run after Job <JobID> has completed, with or without errors
SLURM also accepts abbreviated notation for sbatch command
-n total procs
SLURM provides various tools for monitoring and manipulating jobs
Check queue status
- -u <username>: show status of all jobs for a particular user
- -j <jobid>: show status for jobid
- -l: show long format of queue status
- -p <name>: show status of all jobs in paritition name
- -s: show estimated start time
Use --help option to see a full list of allowed options and usage
checkq is a script accessible through the soltools modules which provides squeue with some useful defaults and can accept the above options.
Cancel/delete a job
You can only delete only your jobs that are in queue or already running
Manipulate Jobs in Queue
A user or admin can manipulate jobs that are in queue i.e. not running yet.
You can only release jobs that you have held. If an admin has held your job, only the admin can release it.
Examples of SPECIFICATION are
- add dependency after a job has been submitted: dependency=<attributes>
- change job name: jobname=<name>
- change partition: partition=<name>
- modify requested runtime: timelimit=<hh:mm:ss>
- request gpus (when changing to one of the gpu partitions): gres=gpu:<1,2,3 or 4>
checkload is a script accessible through the soltools modules which provides sinfo with some useful defaults and can accept the above options.
Click Here for status of Sol partitions - updated every 15 mins, accessible at Lehigh and VPN only. This page is generated from output of checkq and checkload for partition status and node usage respectively.