This section will describe key aspects of the platforms that the IPS has been ported to, key locations relevant to the IPS, and the platform configuration settings in general and specific to the platforms described below.
Important Note - while this documentation is intended to remain up to date, it may not always reflect the current status of the machines. If you run into problems, check that the information below is accurate by looking at the websites for the machine. If you are still having problems, contact the framework developers.
Each subsection will contain information about the platform in question. If you are porting the IPS to a new platform, these are the items that you will need to know or files and directories to create in order to port the IPS. You will also need a platform configuration file (described below). Available queue names are listed with the most common ones in bold.
The platforms below fall into the following categories:
- general production machines - large production machines on which the majority of runs (particularly production runs) are made.
- experimental systems - production or shared machines that are being used by a subset of SWIM members for specific research projects. These systems may also be difficult for others to get accounts.
- formerly used systems - machines that the IPS was ported to but we either do not have time on that machine, it has been retired by its hosting site, or it is not in wide use anymore.
- single user systems - laptop or desktop machines for testing small problems.
Franklin is a Cray XT4 managed by NERSC.
Hopper is a Cray XE6 managed by NERSC.
Swim is a SMP hosted by the fusion theory group at ORNL.
Pacman is a linux cluster hosted at ARSC.
Iter is a linux cluster (?) that is hosted ???.
Odin is a linux cluster hosted at Indiana University.
Sif is a linux cluster hosted at Indiana University.
Viz/mhd are SMP machines hosted at PPPL. These systems appear not to be online any more.
Pingo was a Cray XT5 hosted at ARSC.
Jaguar is a Cray XT5 managed by OLCF.
The IPS can be run on your laptop or desktop. Many of the items above are not present or relevant in a laptop/desktop environment. See the next section for a sample platform configuration settings.
The platform configuration file contains platform specific information that the framework needs. Typically it does not need to be changed for one user to another or one run to another (except for manual specification of allocation resources). For most of the platforms above, you will find platform configuration files of the form ips/<machine name>.conf. It is not likely that you will need to change this file, but it is described here for users working on experimental machines, manual specification of resources, and users who need to port the IPS to a new machine.
HOST = franklin
MPIRUN = aprun
PHYS_BIN_ROOT = /project/projectdirs/m876/phys-bin/phys/
DATA_TREE_ROOT = /project/projectdirs/m876/data
DATA_ROOT = /project/projectdirs/m876/data/
PORTAL_URL = http://swim.gat.com:8080/monitor
RUNID_URL = http://swim.gat.com:4040/runid.esp
#######################################
# resource detection method
#######################################
NODE_DETECTION = checkjob # checkjob | qstat | pbs_env | slurm_env
#######################################
# manual allocation description
#######################################
TOTAL_PROCS = 16
NODES = 4
PROCS_PER_NODE = 4
#######################################
# node topology description
#######################################
CORES_PER_NODE = 4
SOCKETS_PER_NODE = 1
#######################################
# framework setting for node allocation
#######################################
# MUST ADHERE TO THE PLATFORM'S CAPABILITIES
# * EXCLUSIVE : only one task per node
# * SHARED : multiple tasks may share a node
# For single node jobs, this can be overridden allowing multiple
# tasks per node.
NODE_ALLOCATION_MODE = EXCLUSIVE # SHARED | EXCLUSIVE
[1] | (1, 2, 3) This value should not change unless the machine is upgraded to a different architecture or implements different allocation policies. |
[2] | Used in manual allocation detection and will override any detected ppn value (if smaller than the machine maximum ppn). |
[3] | (1, 2) Only used if manual allocation is specified, or if no detection mechanism is specified and none of the other mechansims work first. It is the users responsibility for this value to make sense. |
[4] | Currently the porting documentation is under construction. Use python script ips/framework/utils/test_resource_parsing.py to determine which automatic parsing works for the platform in question. If nothing works, use the manual settings and contact the framework developers to look into developing a method for automatically detecting the allocation. |
Due to the recent changes in the framework regarding resource management, some platforms may not have platform configuration files in the repository. Below is a list of those that are in the repo and work with the recent changes to the framework.
In addition to these files, there is ips/workstation.conf, a sample platform configuration file for a workstation. It assumes that the workstation:
- does not have a batch scheduler or resource manager
- may have multiple cores and sockets
- does not have portal access
- will manually specify the allocation
HOST = workstation
MPIRUN = mpirun # eval
PHYS_BIN_ROOT = /home/<username>/phys-bin
DATA_TREE_ROOT = /home/<username>/swim_data
DATA_ROOT = /home/<username>/swim_data
#PORTAL_URL = http://swim.gat.com:8080/monitor
#RUNID_URL = http://swim.gat.com:4040/runid.esp
#######################################
# resource detection method
#######################################
NODE_DETECTION = manual # checkjob | qstat | pbs_env | slurm_env | manual
#######################################
# manual allocation description
#######################################
TOTAL_PROCS = 4
NODES = 1
PROCS_PER_NODE = 4
#######################################
# node topology description
#######################################
CORES_PER_NODE = 4
SOCKETS_PER_NODE = 1
#######################################
# framework setting for node allocation
#######################################
# MUST ADHERE TO THE PLATFORM'S CAPABILITIES
# * EXCLUSIVE : only one task per node
# * SHARED : multiple tasks may share a node
# For single node jobs, this can be overridden allowing multiple
# tasks per node.
NODE_ALLOCATION_MODE = SHARED # SHARED | EXCLUSIVE
[5] | (1, 2) These need to be updated to match the “allocation” size each time. Alternatively, you can just use the command line to specify the number of nodes and processes per node. |