This reference guide is designed to help you through the process of setting up a simulation to run. It provides instructions on how to change configuration files and how to build and run the IPS on a given platform, as well as, determine if the simulation is setup correctly and will produce the correct data. In the various sections the user will find a series of questions designed to help the user plan for the preparation, execution, and post-processing of a run (or series of runs).
Before going further, some basic definitions of terms that are used in the IPS must be presented. These terms are specific to the IPS and may be used in other contexts with different meanings. These are brief definitions and designed to remind the user of their meaning.
This section consists of an outline of how the IPS is intended to be used. It will walk you through the steps from forming an idea of what to run, through running it and analyzing the results. This will also serve as a reference for running IPS simulations. If you are not comfortable with the elements of an IPS simulation, then you should start with the sample simulations in Getting Started and review the terminology above.
Before embarking on a simulation experiment, the problem that you are addressing needs to be determined. The problem may be a computational one where you are trying to determine if a component works properly, or an experiment to determine the scalability or sensitivity to computation parameters, such as time step length or number of particles. The problem may pertain to a study of how a component, or set of components, compare to previous results or real data. The problem may be to figure out for a set of variations which one produces the most stable plasma conditions. In each case, you will need to determine:
- what components are needed to perform this experiment?
- what input files must be obtained, prepared or generated (for each component and the simulation as a whole)?
- does this set of components make sense?
- what driver(s) are needed to perform this experiment?
- do new components and drivers need to be created?
- does it make sense to run multiple simulations in a single IPS instance?
- how will multiple simulations effect the computational needs and amount of data that is produced?
- what plasma state files are needed?
- where will initial plasma state values (and those not modeled by components in this scenario) come from?
- how much compute time and resources are needed for each task? the simulation as a whole?
- are there any restrictions on where or when this experiment can be run?
- how will the output data be analyzed?
- where will the output data go when the simulation is completed?
- when and where will the output data be analyzed?
Once you have a plan for constructing, managing and analyzing the results of your simulation(s), it is time to begin preparation.
In many cases, new components or modifications to existing components need to be made. In this section, the anatomy of a component and a driver are explained for a simple invocation style of execution (see Advanced User Guide for more information on creating components and drivers with complex logic, parallelism and asynchronous control flow).
Each component is derived from the Component class, meaning that each IPS component inherits a few base capabilities, and then must augment them. Each IPS component must implement the following function bodies for the component class:
To create a new component, there are two ways to do it, start from “scratch” by copying and renaming the skeleton component (ips/doc/examples/skeleton_component.py) to your desired location [1], or by modifying an existing component (e.g., ips/doc/examples/example_component.py). When creating your new component, keep in mind that it should be somewhat general and usable in multiple contexts. In general, for things that change often, you will want to use component configuration variables or input files to drive the logic or set parameters for the tasks. For more in depth information about how to create components and add them to the build process, see Developing Drivers and Components for IPS Simulations.
When changing an existing component that will diverge from the existing version, be sure to create a new version. If you are editing an existing component to make it better, be sure to document what you changexs.
[1] | Components are located in the ips/components/ directory and are organized by port name, followed by implementation name. It is also common to put input files and helper scripts in the directory as well. |
At this point, all components and drivers should be added to the repository, and any makefiles modified or created (see makefile section of component writing guide). You are now ready to set up the execution environment, build the IPS, and prepare the input and configuration files.
First, the platform on which to run the simulation must be determined. When choosing a platform, take in to consideration:
- The parallelism of the tasks you are running
- Does your problem require 10s, 100s or 1000s of cores?
- How well do your tasks take advantage of “many-core” nodes?
- The location of the input files and executables
- Does your input data exist on a suitable platform?
- Is it reasonable to move the data to another machine?
- Time and CPU hours
- How much time will it take to run the set of simulations for the problem?
- Is there enough CPU time on the machine you want to use?
- Dealing with results
- Do you have access to enough hard drive space to store the output of the simulation until you have the time to analyze and condense it?
Once you have chosen a suitable platform, you may build the IPS like so:
host ~ > cd <path to ips>
host ips > . swim.bashrc.<machine_name>
host ips > svn up
host ips > make clean
host ips > cp config/makeconfig.<machine_name> config/makeconfig.local
host ips > make
host ips > make install
Second, construct input files or edit the appropriate ones for your simulation. This step is highly dependent on your simulation, but make sure that you check for the following things (and recheck after constructing the configuration file!):
- Does each component have all the input files it needs?
- Are there any global initial files, and are they present? (This includes any plasma state and non-plasma state files.)
- For each component input file: Are the values present, valid, and consistent?
- For the collection of files for each component: Are the values present, valid, and consistent?
- For the collection of files for each simulation: Are the values present, valid, and consistent?
- Do the components model all of the targeted domain and phenomena of the experiment?
- Does the driver use the components you expect?
- Does the driver implement the data dependencies between the components as you wish?
Third, you must construct the configuration file. It is helpful to start with a configuration file that is related to the experiment you are working on, or you may start from the example configuration file, and edit it from there. Some configuration file values are user specific, some are platform specific, and others are simulation or component specific. It may be helpful to save your personal versions on each machine in your home directory or some other persistent storage location for reuse and editing. These tend not to be good files to keep in subversion, however there are some examples in the example directory to get you started. The most common and required configuration file entries are explained here. For more a more complete description of the configuration options, see The Configuration File - Explained.
User Data Section:
USER_W3_DIR = <location of your web directory on this platform>
USER_W3_BASEURL = <URL of your space on the portal>
USER = <user name> # Optional, if missing the unix username is used
Set these values to the www directory you created for your own runs, a matching url for the portal to store your run info, and your user name (this is used on the portal to identify simulations you run). These should be the same for all of your runs on a given platform.
Simulation Info Section:
RUN_ID = <short name of run>
TOKAMAK_ID = <name of the tokamak>
SHOT_NUMBER = 1
...
SIM_NAME = ${RUN_ID}_${SHOT_NUMBER}
OUTPUT_PREFIX =
IPS_ROOT = <location of built ips>
SIM_ROOT = <location of output tree>
RUN_COMMENT = <used by portal to help identify what ran and why>
TAG = <grouping string>
...
SIMULATION_MODE = NORMAL
RESTART_TIME =
RESTART_ROOT = ${SIM_ROOT}
In this section the simulation is described and key locations are specified. RUN_COMMENT and TAG, along with RUN_ID, TOKAMAK_ID, and SHOT_NUMBER are used by the portal to describe this simulation. RUN_ID, TOKAMAK_ID, and SHOT_NUMBER are commonly used to construct the SIM_NAME, which is often used in as the directory name of the SIM_ROOT. The IPS_ROOT is the top-level of the IPS source tree that you are using to execute this simulation. And finally, the SIMULATION_MODE and related items identify the simulation as a NORMAL or RESTART run.
Logging Section:
LOG_FILE = ${RUN_ID}_sim.log
LOG_LEVEL = DEBUG | WARN | INFO | CRITICAL
The logging section defines the name of the log file and the default level of logging for the simulation. The log file for the simulation will contain all logging messages generated by the components in this simulation. Logging messages from the framework and services will be written to the framework log file. The LOG_LEVEL may be the following and may differ from the framework log level (in order of most verbose to least) [2]:
Plasma State Section:
PLASMA_STATE_WORK_DIR = ${SIM_ROOT}/work/plasma_state
# Config variables defining simulation specific names for plasma state files
CURRENT_STATE = ${SIM_NAME}_ps.cdf
PRIOR_STATE = ${SIM_NAME}_psp.cdf
NEXT_STATE = ${SIM_NAME}_psn.cdf
CURRENT_EQDSK = ${SIM_NAME}_ps.geq
CURRENT_CQL = ${SIM_NAME}_ps_CQL.dat
CURRENT_DQL = ${SIM_NAME}_ps_DQL.nc
CURRENT_JSDSK = ${SIM_NAME}_ps.jso
# List of files that constitute the plasma state
PLASMA_STATE_FILES1 = ${CURRENT_STATE} ${PRIOR_STATE} ${NEXT_STATE} ${CURRENT_EQDSK}
PLASMA_STATE_FILES2 = ${CURRENT_CQL} ${CURRENT_DQL} ${CURRENT_JSDSK}
PLASMA_STATE_FILES = ${PLASMA_STATE_FILES1} ${PLASMA_STATE_FILES2}
Specifies the naming convention for the plasma state files so the framework and components can manipulate and reference them in the config file and during execution. The initial file locations are also specified here.
Ports Section:
[PORTS]
NAMES = INIT DRIVER MONITOR EPA RF_IC NB FUS
# Required ports - DRIVER and INIT
[[DRIVER]]
IMPLEMENTATION = GENERIC_DRIVER
[[INIT]]
IMPLEMENTATION = minimal_state_init
# Physics ports
[[RF_IC]]
IMPLEMENTATION = model_RF_IC
[[FP]]
IMPLEMENTATION = minority_model_FP
[[FUS]]
IMPLEMENTATION = model_FUS
[[NB]]
IMPLEMENTATION = model_NB
[[EPA]]
IMPLEMENTATION = model_EPA
[[MONITOR]]
IMPLEMENTATION = monitor_comp_4
The ports section specifies which ports are included in the simulation and which implementation of the port is to be used. Note that a DRIVER must be specified, and a warning will be issued if there is no INIT component present at start up. The value of IMPLEMENTATION for a given port must correspond to a component description below.
Component Configuration Section:
[<component name>]
CLASS = <port name>
SUB_CLASS = <type of component>
NAME = <class name of component implementation>
NPROC = <# of procs for task invocations>
BIN_PATH = ${IPS_ROOT}/bin
INPUT_DIR = ${DATA_TREE_ROOT}/<location of input directory>
INPUT_FILES = <input files for each step>
OUTPUT_FILES = <output files to be archived>
PLASMA_STATE_FILES = ${CURRENT_STATE} ${NEXT_STATE} ${CURRENT_EQDSK}
RESTART_FILES = ${INPUT_FILES} <extra state files>
SCRIPT = ${BIN_PATH}/<component implementation>
For each component, fill in or modify the entry to match the locations of the input, output, plasma state, and script locations. Also, be sure to check the NPROC entry to suit the problem size and scalability of the executable, and add any component specific entries that the component implementation calls for. The data tree is a SWIM-public area where simulation input data can be stored. It allows multiple users to access the same data and have reasonable assurance that they are indeed using the same versions. On franklin the data tree root is /project/projectdirs/m876/data/, and on stix it is /p/swim1/data/. The plasma state files must be part of the simulation plasma state. It may be a subset if there are files that are not needed by the component on each step. Additional component-specific entries can also appear here to signal a piece of logic or set a data value.
Checkpoint Section:
[CHECKPOINT]
MODE = WALLTIME_REGULAR
WALLTIME_INTERVAL = 15
NUM_CHECKPOINT = 2
PROTECT_FREQUENCY = 5
This section specifies the checkpoint policy you would like enforced for this simulation, and the corresponding parameters to control the frequency and number of checkpoints taken. See the comments in the same configuration file or the configuration file documentation. If you are debugging or running a component or simulation for the first time, it is a good idea to take frequent checkpoints until you are confident that the simulation will run properly. For guidance on specifying the checkpoint interval, see Fundamentals of the Advanced Features of the IPS.
Time Loop Section:
[TIME_LOOP]
MODE = REGULAR
START = 0.0
FINISH = 20.0
NSTEP = 5
This section sets up the time loop to help the driver manage the time progression of the simulation. If you are debugging or running a component or simulation for the first time, it is a good idea to take very few steps until you are confident that the simulation will run properly.
Lastly, double-check that your input files and config file are both self-consistent and make physics sense.
[2] | For more information and guidance about how the Python logging module works, see the Python logging module tutorial. |
Now, that you have everything set up, it is time to construct the batch script to launch the IPS. Just like the configuration files, this is something that tends to be user specific and platform specific, so it is a good idea to keep local copy in a persistant directory on each platform you tend to use for easy modification.
As an example, here is a skeleton of a batch script for Franklin:
#! /bin/bash
#PBS -A <project code for accounting>
#PBS -N <name of simulation>
#PBS -j oe # joins stdout and stderr
#PBS -l walltime=0:6:00
#PBS -l mppwidth=<number of *cores* needed>
#PBS -q <queue to submit job to>
#PBS -S /bin/bash
#PBS -V
IPS_ROOT=<location of IPS root>
cd $PBS_O_WORKDIR
umask=0222
$IPS_ROOT/bin/ips [--config=<config file>]+ \
--platform=$IPS_ROOT/franklin.conf \
--log=<name of log file> \
[--debug] \
[--nodes=<number of nodes in this allocation>] \
[--ppn=<number of processes per node for this allocation>]
Note that you can only run one instance of the IPS per batch submission, however you may run multiple simulations in the same batch allocation by specifying multiple --config=<config file> entries on the command line. Each config file must have a unique file name, and SIM_ROOT. The different simulations will share the resources in the allocation, in many cases improving the resource efficiency, however this may make the execution time of each individual simulation a bit longer due to waiting on resources. For more information on running multiple simulations, see Fundamentals of the Advanced Features of the IPS.
The IPS also needs information about the platform it is running on (--platform=$IPS_ROOT/franklin.conf) and a log file (--logfile=<name of log file>)for the framework output. Platform files for commonly used platforms are provided in the top-level of the ips directory. It is strongly recommended that you use the appropriate one for launching IPS runs. See Platforms and Platform Configuration for more information on how to use or create these files.
Lastly, there are some optional command line arguments that you may use. --debug will turn on debugging information from the framework. --nodes and --ppn allow the user to manually set the number of nodes and processes per node for the framework. This will override any detection by the framework and should be used with caution. It is, however, a convenient way to run the ips on a machine without a batch scheduler.
Once your job is running, you can watch their progress on the portal. Note that each simulation will appear on the portal, so multiple simulation jobs will look like multiple simulations that all started around the same time.
Once your run (or set of runs) is done, it is time to look at the output. First, we will examine the structure of the output tree:
${SIM_ROOT}/
${PORTAL_RUNID}
File containing the portal run ids that are associated with this directory. There can be more than one.<platform config file>
<simulation configuration files>
Each simulation configuration file that used this sim root.restart/
<each checkpoint>/
<each component>/
Directory containing the restart files for this checkpointsimulation_log/
Directory containing the event log for each runid.simulation_results/
<each time step>/
components/
<each component>/
Directory containing the output files for the given component at the given step.<each component>/
Directory containing the output files for each step. File names are appended with the time step to avoid collisions.simulation_setup/
<each component>/
Directory containing the input files from the beginning of the simulation.work/
<each component>/
Directory where the component computes from time step to time step. Leftover input and output files from the last step will be present at the end of the simulation.
There are a few tools for visualizing (and light analysis) of a run or set of runs:
Using these utilities, your own scripts or manual inspection results can be analyzed, or bugs found. Debugging a coupled simulation is more complicated than debugging a standalone code. Here are some things to consider when a problem is encountered:
If you are working out a problem, it is always good to: