You are on page 1of 163

RIKEN Integrated Cluster of Clusters System

User's Guide

Version 1.24
Sep. 14, 2015
Advanced Center for Computing and
Communication
RIKEN

Version management table


Version

Revision date

1.4

2010.02.19

Change content
1.1 Outline of the System modified
2.1.5 Access to the RIKEN network from RICC added
2.2 Account and Authentication modified
2.6 Login environment modified
3.1 Available file area modified
3.2.3 Local disk area (work area) modified
4. How to create jobs modified
5.1.1.3.1 Example of chain job added
5.1.1.4 Major options for job submission modified
5.1.1.6 Software resource modified
5.1.1.10 Script file for Batch job modified
5.1.2 Confirm job information modified
5.1.3 Operate job modified
5.2 Interactive Job modified
6.6 FTL Syntax modified
8 How to use Archive system modified
9.2 RICC Mobile Portal modified
10.1 Product manual modified

1.5

2010.03.17

5.1.3 Operate job modified


7.2 Time measurement modified
8 How to use Archive system modified
11.5.3 Create script (Amber8 : MDGRAPE-3 Cluster) modified

1.6

2010.04.01

1.1 Outline of the System modified


1.2 Hardware outline modified
5.1.1.3 Function outline and Job submit command format modified
5.1.1.5 Hardware resources modified

1.7

2010.06.07

3.1 Available file area modified


3.2.2 Data area modified
5.1.1.4 Major options for job submission modified
5.1.1.6 Software resource modified
5.1.1.8 Major options for job submission command modified
11.2.3 Specify temporary directory added
11.3.1 Create script modified

Copyright (C) RIKEN, Japan. All rights reserved.


- ii -

1.8

2010.07.30

3.2.2 Data area modified


7.2 Debugger added
8 Tuning added

1.9

2010.08.25

1.1 Outline of the System modified


1.2 Hardware outline modified
5.1.1.3 Function outline and Job submit command format modified
5.1.1.5 Hardware resources modified
5.1.1.6 Soware resources modified
6.7 FTL generating tool : ftlgen added
12.2 GaussView modified
12.4 ANSYS modified
12.5 Amber modified

1.10

2010.11.19

1.3 Available application and libraries added


5.1.1.6 Soware resources modified
12.3 NBO for Gaussian added

1.11

2011.05.02

5.1.5 Confirm project user job information added


6.6.12.5 FTL variable modified
12.1.1.6 Use scrach area on the local disk of computing node added
12.5 ANSYS modified

1.12

2011.08.17

5.1.1 Submit batch job modified


5.1.3.2 Hardware resources modified
5.2 Interactive job modified

1.13

2012.4.2

1.2.7 Cluster for single jobs using SSD added


1.5.1 Available computation time modified
3.1 Available file area modified
3.2.2 Data area modified
3.2.3 Local disk area (work area) modified
5.1.3.2 Hardware resources modified
5.1.3.3 Software resource modified
5.1.6.1 Display resource information modified
5.1.6.2 Display usage of core modified
12.1.1.1 Create script modified

1.14

2012.7.13

5.1.3.3 Software resource modified


12.6 Amber modified

1.15

2012.11.26

4.2 Compilation / Linkage for GPGPU program modified

Copyright (C) RIKEN, Japan. All rights reserved.


- iii -

1.16

2013.1.11

1.1 Outline of the System modified


1.2.2 Multi-purpose Parallel Cluster modified
5.1.3.2 Hardware resources modified
5.1.3.3 Software resources modified
12.7 GAMESS modified

1.17

2013.4.1

4.4.5 IMSL Fortran Numerical Library added


12.4 ANSYS modified
12.8 MATLAB added

1.18

2013.5.14

5.1.3.3 Software resources modified


12.9 Q-Chem added

1.19

2013.9.2

4.2 Compilation / Linkage for GPGPU program modified


5.1.3.2 Hardware resources modified
5.1.3.3 Software resources modified
5.1.6.1 Display resource information modified
12. Application removed (moved to RICC portal https://ricc.riken.jp)

1.20

2013.10.16

5.1.3.3 Software resources modified


5.1.6.1 Display resource information modified

1.21

2014.08.04

5.1.3.3 Software resources modified

1.22

2015.04.01

1. Outline of the system modified


2. How to Access modified
3. File Area modified
4. How to create jobs modified
5. How to execute Job modified
6. FTL(File Transfer Language) modified
9. How to use Archive system modified

1.23

2015.04.27

1.3 Software Overview modified

1.24

2015.09.14

1.5 Usage categories modified

Copyright (C) RIKEN, Japan. All rights reserved.


- iv -

Contents
Introduction ............................................................................................................................................ 1
1. Outline of the System ........................................................................................................................ 2
1.1 Outline of the System..................................................................................................................... 2
1.2 Hardware outline ............................................................................................................................ 3
1.3 Software Overview ......................................................................................................................... 5
1.4 Maintenance .................................................................................................................................. 5
1.5 Usage categories ........................................................................................................................... 6
2. How to Access ................................................................................................................................... 7
2.1 Login Flow ...................................................................................................................................... 7
2.2 Account and Authentication ......................................................................................................... 20
2.3 Update Password......................................................................................................................... 21
2.4 Access RICC ................................................................................................................................ 24
2.5 Login environment ....................................................................................................................... 27
2.6 File transfer .................................................................................................................................. 28
3. File Area ............................................................................................................................................ 31
3.1 Available file area ......................................................................................................................... 31
3.2 Type of available file area ............................................................................................................ 32
4. How to create jobs ........................................................................................................................... 33
4.1 Outline of Compilation / Linkage .................................................................................................. 33
4.2 Compilation / Linkage for GPGPU program ................................................................................ 38
4.3 Library management .................................................................................................................... 40
4.4 Linkage of Math library................................................................................................................. 41
4.5 Job Freeze Function .................................................................................................................... 42
5. How to execute Job ......................................................................................................................... 45
5.1 Batch job / Interactive batch job ................................................................................................... 46
5.2 Interactive Job .............................................................................................................................. 84
6. FTL (File Transfer Language) ......................................................................................................... 85
6.1 Introduction .................................................................................................................................. 85
6.2 Transfer input file ......................................................................................................................... 86
6.3 Transfer input directory ................................................................................................................ 87
6.4 Transfer output file ....................................................................................................................... 88
6.5 FTL Basic Directory ..................................................................................................................... 90
6.6 FTL Syntax ................................................................................................................................... 92
Copyright (C) RIKEN, Japan. All rights reserved.
-v-

6.7 FTL generating tool : ftlgen ......................................................................................................... 111


7. Development Environment ............................................................................................................113
7.1 Endian conversion ......................................................................................................................113
7.2 Debugger ....................................................................................................................................114
8. Tuning ..............................................................................................................................................116
8.1 Tuning overview ..........................................................................................................................116
8.2 Time measurement .....................................................................................................................116
8.3 Program development support tool ............................................................................................ 120
8.4 Network topology ....................................................................................................................... 125
9. How to use Archive system .......................................................................................................... 126
9.1 Configuration .............................................................................................................................. 126
9.2 pftp ............................................................................................................................................. 127
9.3 hsi............................................................................................................................................... 128
9.4 htar ............................................................................................................................................. 129
10. RICC Portal ................................................................................................................................... 130
10.1 RICC Portal .............................................................................................................................. 130
11. Manual .......................................................................................................................................... 131
11.1 Product manual ........................................................................................................................ 133
Appendix ............................................................................................................................................ 135
1. FTL Examples ................................................................................................................................ 136
1.1 Execute serial job....................................................................................................................... 137
1.2 Execute parallel job .................................................................................................................... 143
1.3 FTL basic directory (FTLDIR command) ................................................................................... 150
1.4 Others ........................................................................................................................................ 154

Copyright (C) RIKEN, Japan. All rights reserved.


- vi -

INTRODUCTION
In this Users Guide, we explain usage of the Supercomputer System (RICC, RIKEN Integrated Cluster
of Clusters) installed at RIKEN. Please read this document before you start using the system. This
Users Guide is available for reference and download at the following homepage. The contents of this
Users Guide are subject to change.

https://ricc.riken.jp

Shell scripts and other examples in this Users Guide are available in the following directory on RICC.

ricc.riken.jp:/usr/local/example

Please send your inquiry on programming consultation, such as usage methods, debugging, paralleling
or tuning programs and any questions about RICC to the following e-mail address.

Email: hpc@riken.jp

No portion of this document may be copied, reproduced, or distributed in any way, or by any means,
without permission.

Copyright (C) RIKEN, Japan. All rights reserved.


-1-

1. Outline of the System


1.1 Outline of the System
RICC (RIKEN Integrated Cluster of Clusters) consists of 2 computing systems for different purposes
(Massively parallel computing, Multi-purpose parallel computing,), Frontend system, 2.2PB disk device
and 2PB tape library system. Massively Parallel Cluster, core of the system, is PC cluster system of
3888 cores (peak performance 45.6 TFLOPS) for massively parallel computing. Multi-purpose Parallel
Cluster with GPU type of accelerator (peak performance 9.3 TFLOPS + 93.3 TFLOPS [single
precision]) for multi-purpose computing such as commercial or free applications execution.

Users are able to edit, compile, link programs, submit batch jobs and obtain computed results through
Login Server (ricc.riken.jp). Also each computing server can run interactive jobs which is necessary for
users to debug their programs. In addition, users can access the system from non-RIKEN network
through VPN and can use the system as if they were on the RIKEN network.

Users are able to login RICC on the RIKEN network by the ssh or the scp, etc. In addition, RICC
provides the web portal site, RICC Portal, which allows users to access RICC by web browser on the
PC. Users are able to edit, compile, link programs, submit batch jobs and obtain computed results on
RICC Portal.
In RICC, users home directories are located in the high speed magnetic disk device. Users can access
files in their home directories from Login Server, Multi-purpose Parallel Cluster. When executing batch
jobs on Massively Parallel Cluster, users need to transfer necessary files from their home directories to
local disks of Massively Parallel Cluster and return computed results back to their home directories.
These operations can be performed easily by commands in shell scripts used when submitting batch
jobs.

All systems of RICC are available to login by the issued RICC user accounts, the RICC passwords and
the passphrases of the public-key based authentication method. The passphrases can be generated on
RICC Portal.

Copyright (C) RIKEN, Japan. All rights reserved.


-2-

1.2 Hardware outline


PC Clusters consist of Massively Parallel Cluster [486 nodes (3888 cores)] and Multi-purpose Parallel
Cluster [100 nodes (800 cores)].

1.2.1 Massively Parallel Cluster


Computation performance
Intel Xeon X5570 (2.93GHz) 1048 nodes (952 CPUs, 3888 cores)
Total peak performance: 2.93 GHz x 4 calculations x 4 cores x 972 CPUs = 45.6 TFLOPS
Memory
12.5TB (12GB x 1048 nodes)
Memory bandwidth: 25.58GB/s = 1066MHz (DDR3-1066) x 8Byte x 3channel)
Byte/FLOP: 0.54 (Byte/Flop) = 25.58GB/s / (2.93GHz x 4calculations x 4cores)
HDD
272TB((147GB 3 + 73GB) 436 + (147GB 6 + 73GB) 50)
Interconnect (DDR InfiniBand)
All 486 nodes with DDR InfiniBand HCA are configured as a computer network of two-way
communication with performance of 16 Gbps per way.

1.2.2 Multi-purpose Parallel Cluster


Computation performance
Intel Xeon X5570 (2.93GHz) 100 nodes (200 CPUs, 800 cores) + NVIDIA Tesla C2075 GPU type
accelerator x 100
Total peak perfomance: 2.93GHz x 4 calculations x 4 cores x 100 CPUs = 9.3 TFLOPS
1.03 TFLOPS (single precision) x 100 = 103 TFLOPS
Memory
2.3 TB (24GB x 100 nodes)
Memory bandwidth: 25.58GB/s = 1066MHz (DDR3-1066) x 8Byte x 3channel)
Byte/FLOP: 0.54 (Byte/Flop) = 25.58GB/s / (2.93GHz x 4calculations x 4cores)
HDD
25.0TB (250GB x 100 nodes)
Interconnect (DDR InfiniBand)
All 100 nodes with DDR InfiniBand HCA are configured as a computer network of two-way
communication with performance of 16 Gbps per way.

Copyright (C) RIKEN, Japan. All rights reserved.


-3-

1.2.3 Frontend system


Frontend system is the first host to login to access RICC. Also it provides environment for program
development and execution for PC Clusters, MDGRAPE-3 Cluster and Large Memory Capacity Server.
Frontend system has 4 Login Servers. The Login Servers are connected to 2 load-balancers for
redundancy and high availability.

1.2.4 Cluster for single jobs using SSD


This cluster can be used via a RICC, provides an environment for jobs that require high-speed I/O
and non-parallel.
Local disk area
SSD 360GB (30GB / core)
Interconnect for data transfer
QDR InfiniBand

Copyright (C) RIKEN, Japan. All rights reserved.


-4-

1.3 Software Overview


Information about available application(Gaussian, Amber etc) and libraries(FFTW, GSL, HDF5,
Python library etc) on RICC is released in the following URL.

https://ricc.riken.jp/cgi-bin/hpcportal.2.2/index.cgi?LMENU=SYSTEM

The software available on the RICC system are listed as follows:

Table 1-1 Software overview


Category

Massively Parallel

Multi-purpose Parallel

Cluster for single jobs

Front End

Cluster (MPC)

Cluster(UPC)

using SSD(SSC)

system

OS

Red Hat Enterprise Linux 5 (Linux kernel version 2.6)

Compiler

Fujitsu compiler
Intel Parallel Studio XE Composer Edition for Fortran and C++ Linux

Library

Fujitsu's math libraries


- BLAS, LAPACK, ScaLAPACK, MPI, SSLII, C-SSLII, SSLII/MPI
Intel MKL
- BLAS, LAPACK, ScaLAPACK

Application

GOLD/Hermes

Gaussian, Amber, ADF,

Gaussian, Amber, ADF,

Q-Chem

Q-Chem,

GaussView

GOLD/Hermes

1.4 Maintenance
Basically RICC is 24/7 operation, but emergent maintenance is performed if needed. We make every
effort to inform users of the maintenance in advance.

Copyright (C) RIKEN, Japan. All rights reserved.


-5-

1.5 Usage categories


We have following four user categories. Users use RICC in one of the categories.

General Use

Quick Use

For more information, please refer to 4. Usage Categories in RIKEN Supercomputer System Usage
Policy, which is available on the following URL.

http://accc.riken.jp/en/supercom/application/usage-policy/
-policy/
1.5.1 Available computation time
Available computation time is different by projects. Use the listcpu command to check allotted
computation time, used computation time and date of expiry of allotted computation time. When used
computation time reaches 100%, jobs cannot be submitted.
[username@ricc1:~] listcpu
[Q00100] Study of parallel programs
Limit(h)

Used(h)

<-- Project no./Project name

Use(%)

Date of expiry

---------------------------------------------------------------------Total

402000.0

80400.0

20.0%

2016/03/31

+- mpc

80000.0

+- upc

400.0

+- ssc

0.0

[explanation]
Limit(h)

: Allotted computation time (unit: hour)

Used(h)

: Used computation time (unit: hour)

Use(%)

: Used computation time / Alloted computation time (unit: %)

Date of expiry

: Date of expiry of allotted computation time

1.5.2 List Project number / Project name


Use listprj (or listproject) to list Project number and Project name.
[username@ricc1:~] listprj
Q00001(Quick) Study of massively parallel programs on RIKEN Cluster of
Clusters
G00001(General) Research of RICC

Copyright (C) RIKEN, Japan. All rights reserved.


-6-

2. How to Access
2.1 Login Flow
The login flow for RICC system from account application to login as folllows:
When the account is issued, the e-mail with the client certficate attachment is sent. After installing the
client certificate on your PC, access to the RICC Portal. You can login to the front end servers via SSH
by registering your ssh public key on the RICC Potal.

Figure 2-1 Login Flow

Copyright (C) RIKEN, Japan. All rights reserved.


-7-

2.1.1 Initial Settings


When accessing the system for the first time, login to the RICC Portal and make sure to do the
following initial settings:

2.1.2 Install Client Certificate

2.1.3 Generate / Register public key and private key

2.1.2 Install Client Certificate


2.1.2.1 Windows
Install the client certificate ACCC sent you by e-mail.
Double click the client certificate provided by ACCC. The Certificate Import Wizard starts. Click
"Next" button.

Figure 2-2 The first screen of "Certificate Import Wizard"

Copyright (C) RIKEN, Japan. All rights reserved.


-8-

Figure 2-3 The second screen of "Certificate Import Wizard"

1. Enter the passowrd for "Client


Certificate" issued by ACCC.
2. Click "Next" button.

Figure 2-4 The third screen of "Certificate Import Wizard"

Figure 2-5 The fourth screen of "Certificate Import Wizard"

Copyright (C) RIKEN, Japan. All rights reserved.


-9-

Figure 2-6 The fifth screen of "Certificate Import Wizard"

Figure 2-7 The sixth screen of "Certificate Import Wizard"

Copyright (C) RIKEN, Japan. All rights reserved.


- 10 -

2.1.2.2 Mac
Install the client certificate ACCC sent you by e-mail.
Double click the client certificate provided by ACCC.

Figure 2-8 The first screen of "Certificate Import Wizard"

Enter

the

passowrd

for

"Client Certificate" issued


by ACCC.

Copyright (C) RIKEN, Japan. All rights reserved.


- 11 -

2.1.3 Generate / Register public key and private key


When accessing RICC by Virtual terminal (ssh / scp, etc), the authentication method is the public-key
based authentication method whether accessing from the RIKEN network or from non-RIKEN network.
Therefore, each user needs to register the public key into Login Server and the private key into the PC /
WS accessing RICC. Preparation flow is following.
(1) Access RICC Portal (refer to 2.1.3.1 Access RICC Portal)
(2) Generate and/or register a public key by either of following way.
Generate a public key on RICC Portal
Generate a public-key pair of a public key and a private key on RICC Portal, and
then store the private key into the terminal.
(refer to 2.1.3.2 Generate public key and private key on RICC Portal)
B) Generate a public key on the terminal (Mac, Linux, etc.) (for advanced users)
Generate a public-key pair of a public key and a private key on the terminal (Mac,
Linux, etc.) accessing RICC, and then register the public key on RICC Portal.
(refer to 2.1.3.3 Generate a public key on the terminal (Mac, Linux, etc.) (for
advanced users))

2.1.3.1 Access RICC Portal


RICC users access RICC Portal (following URL) to generate a public-key pair.

https://ricc.riken.jp

1. Select Client Certification


Click [OK]

Copyright (C) RIKEN, Japan. All rights reserved.


- 12 -

1. Enter RICC user account

2. Enter RICC password

3. Click [LOGIN]

Fig. 2-9 RICC Portal login window

Copyright (C) RIKEN, Japan. All rights reserved.


- 13 -

2.1.3.2 Generate public key and private key on RICC Portal


(1) At [Setting] [Key Generation] menu, enter a public-key passphrase.
(Dont forget the public-key passphrase)
1. Click [Setting]
2. Click [Key Generation]

3. Enter a passphrase
4. Retype the same
passphrase
5. Select [SSH-2RSA]
6. Select OS(Software) type
7. Click [Generate Key]

Fig. 2-10 Public-key pair generation window

Copyright (C) RIKEN, Japan. All rights reserved.


- 14 -

Save the private key into the PC.


[In case of Windows (for PuTTY, WinSCP)]
1. The private key is displayed
at the bottom of the window
2. Copy the private key strings
3. Save it into the terminal as a text file

Note 1: Save the text file by one of the


following editors with the character code.
notepad: ANSI
wordpad: text document
Note 2: Extension of the text file should be
ppk
(e.g. id_rsa.ppk)

Fig. 2-11 Private key display window


[In case of Mac(OS X)/UNIX/Linux]
1. The private key is displayed
at the bottom of the window
2. Copy the private key strings
3. Save it into the terminal as a text file
4. Change permission of the file to 600
e.g. $ chmod 600 ~/.ssh/id_rsa
(note)
Save the private key file as ~/.ssh/id_rsa. If
saved in other directory or as other name,
specify the private key file when you access
RICC by the ssh command as follows.
e.g. $ ssh -i private-key-file
-l RICC-account

ricc.riken.jp

Fig. 2-12 Private key display window

* Public-key pairs can be generated as many as users want. Also, registered public keys
generated in past time are not deleted by generation of public-key pairs.
Copyright (C) RIKEN, Japan. All rights reserved.
- 15 -

2.1.3.3 Generate a public key on the terminal (Mac, Linux, etc.) (for advanced users)
(*) If you generate a public key in the way of Generate public key and private key on RICC
Portal2.1.3.2 , please skip this section.
(1) Use the ssh-keygen command on the terminal to generate a public-key pair on the terminal.

Mac (OS X): Start Terminal. Execute the ssh-keygen command.

UNIX / Linux: Start terminal emulater. Execute the ssh-keygen command.


1.

Enter the ssh-keygen command

2.

Press the return key (If you save it


as other than ~/.ssh/id_rsa file,
enter a file name. (Note))

3.

Enter passphrase

4.

Retype the same passphrase

(Note) In such case, specify the private


key file when you access RICC by the
ssh command as follows.
Exapmple)
$ ssh -i private-key-file
-l RICC-account

ricc.riken.jp

Fig. 2-13 Generate a public-key pair


Access RICC Portal from web browser. Move to Key management window.

https://ricc.riken.jp
1. Click [Setting]

2. Click [Key Management]

3. Click [Update Public Key]

Fig. 2-14 Move to key management window

Copyright (C) RIKEN, Japan. All rights reserved.


- 16 -

Display the generated public key and register it on RICC Portal.

Mac(OS X): Start Terminal. Execute the cat command to display the public key.

UNIX / Linux: Start terminal emulater. Execute the cat command to display the public key..
(Note) If the ssh-keygen command is executed with no argument at step (1), a public
key is stored in ~/.ssh/id_rsa.pub file.
Command example: $ cat ~/.ssh/id_rsa.pub
1. Display the generated public
key.
$ cat "public-key-file"
2. Copy the content

Fig. 2-15 Copy the content of the public key

1. Paste the content of the


public key.

2. Select key type

3. Click [save]

Fig. 2-16 Register the public key


Logout RICC
Click [logout]

Fig. 2-17 RICC Portal logout

Copyright (C) RIKEN, Japan. All rights reserved.


- 17 -

2.1.3.4 Delete registered public key


(1) Access RICC Portal from web browser.

https://ricc.riken.jp
Move to [Delete Public Key] window.

1. Click [Setting]
2. Click [Key Management]
3. Click [Delete Public Key]

Fig. 2-18 Move to Delete Public Key window


(2) Delete the registered public keys.

Click [Delete All Keys]


*All the registered public
keys are deleted.

Fig. 2-19 Deletion of public keys window

(3) Logout RICC Portal.

Click [logout]

Fig. 2-20 RICC Portal logout

Copyright (C) RIKEN, Japan. All rights reserved.


- 18 -

2.1.4 Network Access


Destination hosts are as follows:
Host name(FQDN)

Purpose to access

ricc.riken.jp
riccgv.riken.jp

Usaual access
GaussView use note 1

note 1 : On how to use GaussView , please refer to RICC Portal(https://ricc.riken.jp)

2.1.5 Available service


ssh/scp (Virtual terminal, file transfer)
https (RICC Portal, online manual)

2.1.6 Access to out of the RIKEN network from RICC


When you access to external systems from the RICC system, login to the front end servers enabling
SSH Agent forwarding (-A option).
[username@Your-PC ~]$ ssh -A

username greatwave.riken.jp

After login HOKUSAI-GreatWave, login to the RICC front end servers enabling SSH Agent
forwarding (-A option).
[username@greatwave:~]$ ssh -A

username ricc.riken.jp

Copyright (C) RIKEN, Japan. All rights reserved.


- 19 -

2.2 Account and Authentication


The account to access RICC, which is called RICC user account, is what the user specified in the
application form. However, the password to enter is different by access methods. Password for each
access method is listed in Table 2-1 Access method list.
The RICC password is what is given to the user when the RICC user account is issued. Please change
the initial RICC password after logging into RICC Portal for the first time.
Access method

Protocol

Account

Password

RICC Portal
HPSS
Virtual terminal

https
pftp3
ssh
(scp/sftp)

RICC
user
account
(specified in the
application form)

RICC password
RICC password
Public-key passphrase (specified
by user) 4

Table 2-1 Access method list

1:

pftp is the special commands to transfer files between users home directories and the Archive
system.
pftp is an enhanced command of the ftp. The pftp can be used in the same way of the ftp.

2:

Public-key passphrase is specified by a user when a pair of a public key and a private key is
generated. A pair of a public key and a private key can be generated in RICC Portal. Please refer
to 2.1.3 Generate / Register public key and private key.

Copyright (C) RIKEN, Japan. All rights reserved.


- 20 -

2.3 Update Password


When logging into RICC for the first time, make sure to update the initial RICC password on RICC
Portal.
Password updating flow is following.
(1)

Access RICC Portal (refer to 2.3.1 Access RICC)

(2)

Update password(refer to 2.3.2 Password updating procedure)

2.3.1 Access RICC Portal


RICC users access RICC Portal (following URL) to update the initial password.

https://ricc.riken.jp
2.3.2 Password updating procedure
(1) Login RICC Portal

Select Client Certification

Fig. 2-21 RICC Portal login window

Copyright (C) RIKEN, Japan. All rights reserved.


- 21 -

1. Enter RICC user account

2. Enter the initial password

3. Click [LOGIN]

Fig. 2-22 RICC Portal login window


(2) At [Setting] [Password Update] menu, update the initial password.
(If the initial password is not updated on RICC Portal, Password Update menu is shown just after
logging into RICC Portal.)

1. Click [Setting]
2. Click [Password Update]

3. Enter the initial password


4. Enter a new password
5. Retype the same password
6. Click [Update]

Condition of password:
- At least 6 characters
- Not simple
(e.g. dictionary word)

Fig. 2-23 Password update window

Copyright (C) RIKEN, Japan. All rights reserved.


- 22 -

(3) Confirm password was updated.


Click [OK]

Fig. 2-24 Confirmation of password update window


(4) Logout RICC Portal

Click [logout]

Fig. 2-25 RICC Portal logout

Copyright (C) RIKEN, Japan. All rights reserved.


- 23 -

2.4

Access RICC

2.4.1 Login
Use ssh service to login RICC from PC / WS. The ssh command for UNIX / Mac (OS X) and PuTTY for
Windows are recommended. PuTTY is available on the following website.
http://www.chiark.greenend.org.uk/~sgtatham/putty/

The host to access is following.


Host name (FQDN)
ricc.riken.jp
Login prompt varies each time of login because Login Servers (4 servers) are load-balanced by the
load-balancers.

A)

For UNIX / Mac(OS X)


% ssh

username greatwave.riken.jp

T The authenticity of host 'greatwave.riken.jp' can't be established.


RSA key fingerprint is 26:8a:53:1e:d3:3f:ed:29:e0:a3:32:0d:d5:6e:1a:e2
Are you sure you want to continue connecting (yes/no)? yes

Displayed only
.

at first-time login.

<---------------------- Enter [yes]

Warning: Permanently added 'greatwave.riken.jp' (RSA) to the


list of known hosts.
Enter passphrase for key '/home/username/.ssh/id_rsa': ++++<---Enter the pablic-key passphrase
[username@greatwave1:~] ssh l username ricc.riken.jp
The authenticity of host 'ricc.riken.jp' can't be established.

Displayed only

RSA key fingerprint is 26:8a:53:1e:d3:3f:ed:29:e0:a3:32:0d:d5:6e:1a:e2


Are you sure you want to continue connecting (yes/no)? yes

at first-time login.

<---------------------- Enter [yes]

Warning: Permanently added 'ricc.riken.jp' (RSA) to the


list of known hosts.
Enter passphrase for key '/home/username/.ssh/id_rsa': ++++<---Enter the pablic-key passphrase

[username@ricc1:~]

Copyright (C) RIKEN, Japan. All rights reserved.


- 24 -

B)

For Windows

1.

Specify the private key in a virtual terminal.


For PuTTY,
1. Go to [Connection] [SSH]
[Auth] menu
2. Click [Browse] and specify the
private key in
2.1.3
Generate / Register public key and
private key

Fig. 2-26 Private key setting window

2.

Access RICC with a virtual terminal.


For PuTTY,
1. Click [Session]
2. Enter following items
Host name: greatwave.riken.jp
Port:

22

Connection type: SSH


3. Enter a session name at [Saved
Sessions] (e.g.GreatWave)
4. Click [Save]
5. Click [Open]

Fig. 2-27 Virtual terminal (PuTTY) Session window

Copyright (C) RIKEN, Japan. All rights reserved.


- 25 -

3.

For the first-time login, following security alert window is shown. Click [Yes].
This alert is now shown at future logins.

Fig. 2-28 Virtual terminal (PuTTY) Security Alert window

4.

Enter the RICC user account and the public-key passphrase.


1. Enter RICC user account at
[login as]
2. Enter the public-key passphrase

Fig. 2-29 Virtual terminal login completion

2.4.2 Logout
Enter exit or logout at prompt. Logout process might take a little time for post processing (writing the
history file).

Copyright (C) RIKEN, Japan. All rights reserved.


- 26 -

2.5 Login environment


In RICC, bash or tcsh is available as login shell. Default is bash. If you want to change it, please
contact the Advanced Center for Computing and Communication (hpc@riken.jp)

An environment setting file to use RICC is stored in your login directory.


(note) To add paths to environment variable PATH, add them to the end of PATH. If not, you may not
use the system properly.

Also, original skeleton files are available in the following directory of Login Server.

ricc.riken.jp:/usr/local/example/skel

Copyright (C) RIKEN, Japan. All rights reserved.


- 27 -

2.6 File transfer


2.6.1 File transfer of RICC
Use ssh service to transfer files between RICC and the PC / WS. The scp (sftp) command for UNIX /
Mac (OS X) and WinSCP for Windows are recommended. WinSCP is available on the following
website.
http://winscp.net/eng/docs/

The host to access is following.


Host name (FQDN)
ricc.riken.jp
A)

For UNIX / Mac (OS X)


% scp local-file username@greatwave.riken.jp:remote-dir
public key passphrase++++++++
Enter the public-key passphrase
file-name
100% |***********************| file-size
[username@greatwave1:~] scp local-file username@ricc.riken.jp:remote-dir
public key passphrase++++++++
Enter the public-key passphrase

B)

For Windows

Login RICC with WinSCP. Files can be transferred by drag & drop after login.
1.

Login RICC with WinSCP.

1.

Click [New Site]

2.

Enter following items

Host name: greatwave.riken.jp


Port number: 22
User name: RICC user account
Password: Public-key passphrase
3.

Click [Advanced]

Copyright (C) RIKEN, Japan. All rights reserved.


- 28 -

4. Open [Authentication]
5. Enter following items
Private key filePrivate key file
6. Clock [OK]

Fig. 2-30 WinSCP Login window

Copyright (C) RIKEN, Japan. All rights reserved.


- 29 -

2.

Files can be transferred by drag & drop.

Fig. 2-31 WinSCP after login window

Login to GreatWave(greatwave.riken.jp) for transfer files to RICC


[username@greatwave1:~] scp local-file username@ricc.riken.jp:remote-dir
public key passphrase++++++++
Enter the public-key passphrase
file-name

100% |***********************|

file-size

Also, files can be uploaded / downloaded on RICC Portal using web browser.
However, upload / download function of RICC Portal cannot transfer multiple files.

Copyright (C) RIKEN, Japan. All rights reserved.


- 30 -

3. File Area
3.1 Available file area
Available file areas are following.
Area

Area name

Size

Device

homenote 1
data
local disk
(work area)
archive

/home
/data
/work

2.2PB
depends on clusternote 2

4TB/user
(4TB~52TB)/Project
computing node

/arc

2PB

ReadOnly

Table 3-1 Available file area list


Note 1 : home area is limited to less than 500GB per user by Quota.
Note 2 : local disk area of each cluster is limited as follows:
Massively Parallel Cluster

40GB/core

Multi-purpose Parallel Cluster

10GB/core

Cluster for single jobs using SSD

30GB/core

Available file areas for nodes are following.


File area

Login
Server

Massively Parallel
Cluster

Multi-purpose Parallel
Cluster

Cluster for single


jobs using SSD

homenote
data
local disk
(work area)

O
O

O
O

O
O

O
O

O
(for prestaging)

O
(scratch area for job)

O
(scratch area for job)

archive

O : Available for use

- : Not available

Table 3-2 Available file area for nodes

Copyright (C) RIKEN, Japan. All rights reserved.


- 31 -

3.2 Type of available file area


3.2.1 Home area
Home area is 2.2PB shared file system located on Disk Storage System.
Home area is accessible from Login Server, Multi-purpose Parallel Cluster, Cluster for single jobs using
SSD.
Intended purpose:
To store source programs, object files and execution modules
To store small amounts of data

Usage of home area is limited to less than 2.2PB per user by Quota.

3.2.2 Data area


Data area is 2.2PB shared file system located on Archive System.
Data area is accessible from Login Server, Multi-purposes Parallel Cluster, Cluster for single jobs using
SSD.
Intended purpose:
Data sharing between Project members
To store large amounts of data

3.2.3 Local disk area (work area)


Local disk area (work area) is local file system on PC Clusters, Cluster for single jobs using SSD.
Intended purpose:
Staging area for jobs (FTL)
Scratch area while running jobs

Local disk area can be used by users jobs and the files are deleted when the jobs finish.
For Massively Parallel Cluster, the area is limited to less than 40GB per core. The more cores a job
uses, the more capacity the job can use. For example, a job using 4 cores can use the area up to
160GB.
For Multi-purpose Parallel Cluster, Cluster for single jobs using SSD, the area can be used for scratch
area while running jobs.

Copyright (C) RIKEN, Japan. All rights reserved.


- 32 -

4. How to create jobs


4.1 Outline of Compilation / Linkage
In RICC, programs are compiled and linked on Login Server. Specify which machine to compile / link a
program for.

Compilation / Linkage for GPGPU program is done on GPGPU Compile Server (accel). For more
information, please refer to 4.2 Compilation / Linkage for GPGPU program.

Format of compilation / Linkage is as follows.

command

machine-option

[option]

file [...]

command (serial / thread parallel)

f77, f90, cc, c++

command (MPI parallel / XPFortran

mpif77, mpif90, mpicc, mpic++

parallel)

xpfrt

machine-option

-pc

option

optional (options of each compiler)

file

source file, object file


Table 4-1 Format of Compilation / Linkage

Copyright (C) RIKEN, Japan. All rights reserved.


- 33 -

Options to PC Clusters is following.


Option

Meaning

-c

output only object programs

-g

generate the debugging information in object programs

-I

specify the file for include

-L

add directory to the list of directories in which the linker searches for libraries

-l

search the library libname.so or libname.a

-o

specify name of execution module


Table 4-2 Common option list

Common optimization options are following.


Common option

Meaning

Fujitsu

Intel compiler

compiler
-high (*1)

-Kfast

Optimize for high speed

-O3 ipo no-prec-div xHost

execution on the machine


-middle

In

addition

to

-O2

basic

optimizationloop unrolling,
configuration
multiple

change

of

etc.

are

loop,

performed.
-low

Basic optimization

-O1

-none

No optimization

-O0

Table 4-3 Common optimization option list


(*1) Specifying this option may give rise to side effects. Please pay attention.
Common options for thread parallel are following.
Common option

Meaning

Fujitsu compiler

Intel compiler

-auto_parallel

Perform auto parallelization

-Kparallel

-parallel

-auto_parallel_info

Display information of auto

-Kpmsg

-par-report

-KOMP

-openmp

parallelization
-omp

Enable OpenMP directives

Table 4-4 Common thread option list

Copyright (C) RIKEN, Japan. All rights reserved.


- 34 -

Following libraries are available in RICC.


Serial

Machine
PC Clusters

Parallel

Math library

Parallel library

Math library

BLAS

MPI

ScaLAPACK

LAPACK

PVM

SSL II

SSL II
IMSL
Table 4-5 Available library
If a machine to run modules is specified in CLTK user configuration file (${HOME}/.cltkrc),
machine-option (-pc) can be omitted for compilation / linkage.
* Option in command line has a priority over one in CLTK configuration file.

Parameter of CLTK user configuration file is following.


Parameter
CLTK_TARGET_MACHINE

Value
pc

Meaning
generate module for PC Clusters

Table 4-6 Parameter of CLTK user configuration file


Example of CLTK user configuration file:
CLTK_TARGET_MACHINE=pc

There are cautions on compilation / linkage of thread parallel programs or MPI parallel programs. For
more information, please refer to product manuals. On how to refer to product manuals, please refer to
0
Manual.

Copyright (C) RIKEN, Japan. All rights reserved.


- 35 -

4.1.1 Compilation / Linkage for PC Clusters


Fujitsu compiler is used for PC Clusters.

4.1.1.1 Serial program


Use f77/f90/cc/c++ to compile / link serial programs for PC Clusters.

f77/f90/cc/c++ -pc [option] file [...]

1.

Compile / link Fortran77 a program for PC Clusters. (optimization: high)


[username@ricc1:~] f77 pc high o sample1.out sample1.f

2.

Compile / link C a program for PC Clusters. (optimization: high)


[username@ricc1:~] cc pc high o sample2.out sample2.c

4.1.1.2 Thread parallel program


Use f77/f90/cc/c++ to compile / link thread programs for PC Clusters. Specify a common option in
Table 4-4 as thread-option.

f77/f90/cc/c++ -pc thread-option [option] file [...]


1.

Compile / link a Fortran77 program for PC Clusters with auto parallelization.


[username@ricc1:~] f77 pc auto_paralell o auto_para.out auto_para.f

2.

Compile / link a C program including OpenMP for PC Clusters.


[username@ricc1:~] cc pc omp o omp.out omp.c

4.1.1.3 MPI parallel program


User mpif77/mpif90/mpicc/mpic++ to compile / link MPI parallel programs for PC Clusters.

mpif77/mpif90/mpicc/mpic++ -pc [option] file [...]


1.

Compile / link an MPI Fortran77 program for PC Clusters.


[username@ricc1:~] mpif77 pc o mpi_sample1.out mpi_sample1.f

2.

Compile / link an MPI C program for PC Clusters.


[username@ricc1:~] mpicc pc o mpi_sample2.out mpi_sample2.c

Copyright (C) RIKEN, Japan. All rights reserved.


- 36 -

4.1.1.4 XPFortran parallel program


Use xpfrt to compile / link XPFortran(former VPP Fortran) parallel programs.

xpfrt
1.

[option]

file [...]

Compile / link an XPFortran program


[username@ricc1:~] xpfrt o xpf.out xpf.f

Copyright (C) RIKEN, Japan. All rights reserved.


- 37 -

4.2 Compilation / Linkage for GPGPU program


Compilation / Linkage for GPGPU programs (CUDA programs (*)) is done on GPGPU Compile Server
(accel).
(*) For more information on CUDA (Compute Unified Device Architecture), please refer to the following
web site (CUDA ZONE).

http://www.nvidia.com/object/cuda_home_new.html
1.

Log in to GPGPU Compile Server (accel) from Login Server


[username@ricc1 ~] ssh accel
[username@upc0000 ~]

2.

Compile / link GPGPU programs (CUDA C programs) (use of CUDA compiler)


[username@upc0000 ~] nvcc [OPTION] file [...]
or
[username@upc0000 ~] cc nvidia [OPTION] file [...]

3.

Compile / link GPGPU programs (CUDA Fortran programs) (use of PGI compiler)
[username@upc0000 ~] f90 [-pgi] [OPTION] file [...]

4.

Compile / link GPGPU programs (CUDA MPI Fortran programs) (use of PGI compiler)
[username@upc0000 ~] mpif90 [-pgi] ta=nvdia -Mcuda file [...]
(*) Without machine-option on accel, PGI compiler is used by default.

Example of GPGPU programs (CUDA programs) Job script file:


[username@ricc1:~] vi go.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -accel
#MJS: -cwd
#---- Program execution

----#

srun ./a.out

Copyright (C) RIKEN, Japan. All rights reserved.


- 38 -

Example of GPGPU programs (CUDA MPI Fortran programs) Job script file:
[username@ricc1:~] vi go.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -accelex
#MJS: -proc 2
#MJS: -cwd
#---- Program execution ----#
mpirun

-np 2

./multi.exe

[note]
Jobs can be submitted on Login Server (ricc1-4).
Jobs can not be submitted on GPGPU Compile Server (accel).
Specify accel as hardware resource to submit jobs using GPGPU (CUDA programs).
When -accel is specified, the job consumes 1 CPU (4 cores) as resource.
Specify -accelex as the hardware resource if you want to use 1 node exclusively. In this case, each
process consumes 8 cores resource.
A job which uses 2 or more GPGPU consumes 1 node (8cores) per process.

Copyright (C) RIKEN, Japan. All rights reserved.


- 39 -

4.3 Library management


Format of archive command is following. Specify options of the ar command as option.

ar machine-option option archive [member...]

1.

Create an archive for PC Clusters.


[username@ricc1:~] ar pc cr libarchive.a sub1.o sub2.o sub3.o

Copyright (C) RIKEN, Japan. All rights reserved.


- 40 -

4.4 Linkage of Math library


When Math libraries are used by Fujitsu C/C++ compiler, please read cautions in product manuals too.
On how to refer to product manuals, please refer to 0
Manual.

4.4.1 BLAS
Specify blas option to link BLAS library. Specify blas_t option to link BLAS library for thread
parallel.
1.

Link BLAS library for PC Clusters.


[username@ricc1:~] f77 pc blas o blas.out blas.o

4.4.2 LAPACK
Specify lapack option to link LAPACK library. Specify lapack_t option to link LAPACK library for
thread parallel.
1.

Link LAPACK library for PC Clusters.


[username@ricc1:~] f77 pc lapack o lapack.out lapack.o

4.4.3 ScaLAPACK
Specify scalapack option to link ScaLAPACK library. Specify scalapack_t option to link
ScaLAPACK library for thread parallel.
1.

Link ScaLAPACK library for PC Clusters.


[username@ricc1:~] mpif77 pc scalapack o scalapack.out scalapack.o

4.4.4 SSLII
For PC Clusters, SSL II and C-SSL II are available. Specify SSL2 option to link SSL II or C-SSL II
library.
1.

Link an object compiled by Fortran77 for PC Clusters and SSL II library.


[username@ricc1:~] f77 pc SSL2 o ssl2.out ssl2_f.o

2.

Link an object compiled by Fortran77 for PC Clusters and SSL II library for thread parallel.
[username@ricc1:~] f77 pc SSL2 o ssl2thread.out ssl2thread_f.o

3.

Link an object compiled by C for PC Clusters and SSL II library.


[username@ricc1:~] cc pc SSL2 o ssl2.out ssl2_c.o

4.

Link an object compile by C for PC Clusters and SSL II library for thread parallel.
[username@ricc1:~] cc pc SSL2 o ssl2thread.out ssl2thread_c.o

Copyright (C) RIKEN, Japan. All rights reserved.


- 41 -

4.5 Job Freeze Function


On RICC, the Job Freeze Function can save the status of a running job (in a file) that is not completed
before a halt of system operation. When the system operation is restarted, this function restores the job
from the file, and then restarts it.

4.5.1 Jobs as the targets of the Job Freeze Function


Job Freeze applies to the jobs that meet the following condition:

Program compiled by a Fujitsu compiler


To enable the job freeze, the job program must be compiled and linked by Fujitsu compiler and
linked with the job freeze library. (The job freeze library is linked automatically by compilers
described in 4.1.1 Compilation / Linkage for PC Clusters)

4.5.2 Jobs excluded from the targets of the Job Freeze Function
The Job Freeze Function cannot always freeze all jobs. The jobs and their internal information
described below are excluded from Job Freeze targets. An attempt to freeze or defrost such jobs may
fail. Even if one of the said jobs has been frozen and defrosted successfully, its operation may be
unpredictable.

Job concerned with time


If a job using time information is frozen and defrosted, the time information on the period from the
end of freezing to the end of defrosting will be lost. The same applies to the job that uses a timer
process.

Job concerned with the i node number of file


For example, the i node number that can be acquired with system call stat(2) may be changed
during the period between job freezing and job defrosting.

Job process that standard output is redirected to the file


Since the file is overwritten after job defrosting, job lost the output before job freezing.

Job process that standard input is redirected to the file


Since the file cannot be seek at the file location set before job freezing, job operation after job
defrosting will be unpredictable.

Job cooperated or sharing a resource with an external process


When a job cooperated or sharing a resource with an external process is frozen, the external
status related to the job cannot be saved by job freezing. An example of such jobs is the job that
exchanges data with other jobs via files.

Job using a profiler


When a job uses a profiler, the status of the job cannot be saved because the profiler may
communicate with the processes outside of the job. Freezing of the job using the profiler fails.

Shell scripts
When a job uses script language (perl, python, etc.), the Job Freeze will fail.

Copyright (C) RIKEN, Japan. All rights reserved.


- 42 -

Job using unsupported file system


The Job Freeze Function does not support the following file systems:
/dev, procfs, and namefs, etc.
If the program uses any of these file systems when Job Freeze is performed, the Job
Freeze will fail.

Job opening a directory


Freezing of a job that is currently opening a directory fails.

Job using I/O event notification facility epoll system call


Freezing of a job that is using epoll system call fails.

Interactive job

Interactive batch job

Job not using srun, mpirun, xpfrun


The Job Freeze Function freezes a process launched by srun, mpirun, xpfrun. Other processes
does not frozen.

Copyright (C) RIKEN, Japan. All rights reserved.


- 43 -

Job executing srun, mpirun, xpfrun repeatedly in for or while sentence


When defrosting a job, the job script is restarted. The Job Freeze Function saves a line number of
the job script and defrosts the job at that point. Therefore, when executing srun, mpirun, xpfrun
repeatedly in for or while sentence, the Job Freeze Function may not work properly. However,
if the job script is written to save and restore the status at the point of job freezing, the Job Freeze
Function can work properly.
[username@ricc1:~] vi go.sh
#!/bin/bash
#------ qsub option --------#
#MJS: -pc
#MJS: -proc 16
#MJS: -time 10:00:00
#MJS: -eo
#MJS: -cwd
#----- FTL command -----#
#BEFORE: a.out
#BEFORE: input.1
#AFTER:

output.*

#---- Program execution

----#

start=1
end=100
if [ -f ${QSUB_REQID}_index ]; then
start=`cat ${QSUB_REQID}_index`
fi
for (( i = $start; i <= $end; i = i + 1 )); do
echo $i > ${QSUB_REQID}_index
mpirun -stdinfile input.${i} ./a.out > output.${i}
cp output.${i} input.$((i+1))
done
rm ${QSUB_REQID}_index

Copyright (C) RIKEN, Japan. All rights reserved.


- 44 -

5. How to execute Job


There are 3 types of Job. (Refer to Table 5-1 Type of Job)
To batch jobs, necessary resources such as cores and memory for computing are exclusively
allocated. In addition to batch jobs, which do not need to recieve input from terminal, jobs which need
input from terminal can be executed as Interactive batch job.
Interactive job is executed sharing resources for interactive job by time sharing methods.
Job type

Occupancy of

Purpose

Batch job

Execute a job as batch type

Interactive batch job

Execute a job as interactive type


Execute a program (debug etc.)

Interactive job

Start of execution

resource

which is preferred to run


immediately rather than
occupancy of cores / memory.

When resources

Yes

are allocated

No(time sharing
with other

Immediate

interavtive jobs)

Table 5-1 Type of Job


Batch job is classified 4 types by submission patern.
Batch job type
Normal batch job
Chain job

Bulk job

Purpose

Procedure of submission

Execute a job in each script

5.1.1.1 Submit Normal batch job

Execute a set of Jobs by specified

5.1.1.2 Submit chain job

order
Execue Jobs in same script

5.1.1.3 Submit bulk job

Jobs that be managed as one job

Coupled

Execute a set of Jobs started at

calculation job

same time

5.1.1.4 Submit couppled calculation


job

Table 5-2 Type of Batch Job

Copyright (C) RIKEN, Japan. All rights reserved.


- 45 -

5.1 Batch job / Interactive batch job


5.1.1 Submit batch job
5.1.1.1 Submit Normal batch job
Use the qsub command with script file name as argument to submit batch jobs.

qsub [option] script-file [...]


Example) Batch job submission
% qsub

go.sh

Submit a batch job

Request 123777.jms submitted to MJS.


Above message (REQUEST-ID) is displayed at job submission.
If blank characters are included in the current directory path, an error message is displayed. In that
case, please change the directory name including blank characters.

5.1.1.2 Submit chain job


Chain job are executed sequentially job by specified order in submit command line. Two or more jobs
are never executed at the same time.
Specify two or more script files separated by comma (,) without white space to submit chain jobs for
qsub command.

qsub [option] script-file,script-file[,script-file[,...]]


When a job which composed chain jobs is cancelled by qdel command, all subsequent jobs are
cancelled.

Example of chain jobs which use output file for input file of the next job.
Prepare scripts
Prepare scripts which transfer output file (output.x) of the previous job by FTL and use it for input file
of the next job. (go1.sh, go2.sh, go3.sh)

Copyright (C) RIKEN, Japan. All rights reserved.


- 46 -

go1.sh

#!/bin/sh

output file: output.1

#MJS:
#MJS:
#MJS:
#MJS:

-proc 8
-time 1:00:00
-eo
-cwd

#BEFORE: a.out
#AFTER: output.1
mpirun ./a.out -o output.1
go2.sh

#!/bin/sh

input file: output.1

#MJS:
#MJS:
#MJS:
#MJS:

output file: output.2

-proc 8
-time 1:00:00
-eo
-cwd

#BEFORE: a.out
#BEFORE: output.1
#AFTER: output.2
mpirun ./a.out -i output.1 -o output.2
go3.sh

#!/bin/sh

input file: output.2

#MJS:
#MJS:
#MJS:
#MJS:

output file: output.3

-proc 8
-time 1:00:00
-eo
-cwd

#BEFORE: a.out
#BEFORE: output.2
#AFTER: output.3
mpirun ./a.out -i output.2 -o output.3
Submit chain job
Specify the prepared scripts files separated by a comma without white space.
[username@ricc1:~] qsub go1.sh,go2.sh,go3.sh

Copyright (C) RIKEN, Japan. All rights reserved.


- 47 -

5.1.1.3 Submit bulk job


A bulk job is a structure that allows execution of the same program with the same resources multiple
times but with different input files. A bulk job can be submitted, controlled as a single unit. Each job
(subjob) in the bulk job shares the same bulk ID but has a unique bulk index.
Specify -B option and the range of BULK INDEX ID from start number <StartNO> to end number
<EndNO>. Steps of Bulk Index ID can be specified by Step number <StepNO> for qsub command.
It facilitates handling of output and input. The environment variable MJS_BULKINDEX is avalable to
refer bulk index.

qsub -B <StartNO>-<EndNO>[:<StepNO>][option] script-file


A bulk job or a part of sub jobs can be cancelled at once by specifying Bulk ID or Bulk Index ID.
Bulk ID is set to the environment variable MJS_BULKID. Bulk index ID is set to the environment
variable MJS_BULKINDEX. Input or output files can be switched using bulk index ID.
Prepare input files
Prepare input files to use in each subjob.
Sub job [1] inputfile:

input.1

Sub job [2] inputfile:

input.2

Sub job [3] inputfile:

input.3

Prepare script file


Prepare script file for bulk job. The environment variable of each subjob is set to variable
MJS_BULKID and variable MJS_BULKINDEX.
#!/bin/sh
#MJS: -proc 8
#MJS: -time 1:00:00
#MJS: -eo
#MJS: -cwd
#BEFORE: a.out
#BEFORE: input.${MJS_BULKINDEX}
#AFTER:

output.${MJS_BULKINDEX}

mpirun ./a.out -i input.${MJS_BULKINDEX} -o output.${MJS_BULKINDEX}


Submit Bulk job
Specify -B option as bulk job.
[username@ricc1:~] qsub B 1-3 go-bulkjob.sh
Bulk Request 145678.jms Submitted to MJS.
Copyright (C) RIKEN, Japan. All rights reserved.
- 48 -

For above-mentioned example, bulk Job is given to bulk ID 145678 and each subjob is given bulk
Index ID 1, 2, 3. The environment variables, input file and output file name of each subjob are the
following.

Bulk ID

Bulk Index ID
1

145678

Environment variables
MJS_BULKID=145678
MJS_BULKINDEX=1
MJS_BULKID=145678
MJS_BULKINDEX=2
MJS_BULKID=145678
MJS_BULKINDEX=3

Input file name

Output file name

input.1

output.1

input.2

output.2

input.3

output.3

5.1.1.4 Submit couppled calculation job


Two or more jobs as coupled calculation job are stared at the same time. Coupled calculation jobs
are not started until all jobs are allocated computing resource.
Specify two or more script files separated by colon (:) to execute coupled calculation jobs (jobs that
start at the same time).

qsub [option] script-file:script-file:[script-file:[...]]


When a job which composed coupled calculation jobs is cancelled by qdel command, the others also
are cancelled.

5.1.1.5 Confirm completion of Batch job


When a job completes, a standard output file and a standard error output file are created in the
directory where the job is submitted.
In standard output file, standard output of executed job is wrote. In standard error output file, error
messages if errors occur are wrote.

[Execution result file of PC Clusters, MDGRAPE-3 Cluster]


Request-name.oXXXXX.jms ---

Standard output file

Request-name.eXXXXX.jms ---

Standard error output file

(XXXXX is REQUEST-ID displayed at job submission)

[Execution result file of Large Memory Capacity Server]


Request-name.oXXXXX

---

Standard output file

Request-name.eXXXXX

---

Standard error output file

(XXXXX is REQUEST-ID displayed at job submission)


Copyright (C) RIKEN, Japan. All rights reserved.
- 49 -

Caution
In PC cluster and MDGRAPE-3 Cluster, when some batch jobs which write more than hundreds
megabyte data to standard output/error finish at the same time, it takes long time to transfer standard
output/error files because the load of job scheduler becomes high. The all user's job termination
processes are delayed due to this influence.

Therefore, if the size of standard output/error is large, please redirect them to normal files as below.

Example1: Redirect standard output/error to file "$MJS_REQID.log"


Use bash as login shell
#------- FTL command -------#
#AFTER:0: $MJS_REQID.log

! FTL command

#------- Program Execution -------#


mpirun ./a.out >> $MJS_REQID.log 2>&1

# Redirect output to
# $MJS_REQID.log

Use tcsh as login shell


#------- FTL command -------#
#AFTER:0: $MJS_REQID.log

! FTL command

#------- Program Execution -------#


mpirun ./a.out >>& $MJS_REQID.log

# Redirect output to $MJS_REQID.log

Example2: Suppress to write to standard output/error


Use bash as login shell
mpirun ./a.out > /dev/null 2>&1
Use tcsh as login shell
mpirun ./a.out >& /dev/null

Copyright (C) RIKEN, Japan. All rights reserved.


- 50 -

5.1.2 Submit interactive batch job


Use the qsub command with option i (hyphen i) to execute jobs interactively. At job submission,
specify qsub options as argument of the command.

qsub

-i

[option] [script file]

Example) Interactive batch job submission


% qsub -i -pc -proc 4 -mem 4048mb

Submit an interactive batch job

Request 123777.jms submitted to MJS.

Notification of submission completion

Request 123777.jms start

Notification of start of execution

[username@mpc0025~ ] mpirun ./a.out


(output of job execution)
[username@mpc0025~ ] exit

Notification of job completion

End job

If job submission is successful, the system return a notification of submission completion to the prompt
and the job becomes in waiting status on the terminal until the job starts. When resources are allocated,
a notification of start of execution is displayed and resources are available to use. If the user does not
any operation on the terminal for 10 minutes, the user will logout automatically.

If blank characters are included in the current directory path, an error message is displayed. In that
case, please change the directory name including blank characters.

Copyright (C) RIKEN, Japan. All rights reserved.


- 51 -

5.1.3 Function outline and Job submit command format


(1) The resources are classified into the following three categories.
A) Basic resources

: Number of cores, elapsed time, amount of memory

B) Hardware resources:
C) Software resources:

Resources that depend on hardware


Resources that depend on software (application such as ISV)

(2) Specify resources following #MJS: in scripts.


Example
#MJS: -pc -amber
#MJS: -proc 4 -mem 1024mb
#MJS: -time 12:00:00
(3) Dont use colon (:) and comma (,) in a script file name because they have special meanings.
(4) Chain job, bulk job and Coupled calculation job cannot be used for interactive batch job. cannot
be used for interactive batch job.
(5) A user can submit up to 500 jobs per project.
(6) A user can submit up to 5000 bulk jobs per project including normal jobs.
(7) There are limits of using numbers of cores at the same time.
General Use: Up to 3888 cores per project
Quick Use: Up to 256 cores per project

Copyright (C) RIKEN, Japan. All rights reserved.


- 52 -

5.1.3.1

Major options for job submission

Major options for job submission command are as follows.


Command option

Meaning

-proc

<PROCNO>

Specify a number of processes (cores) (default: 1)

-thread

<THREANO>

Specify a number of threads (cores) (default: 1)

-mem[ory]

-hdd

-time

<MEMSIZE>[kb
|mb|gb]

Specify amount of memory per process


default unit: mb (PC Clusters / MDGRAPE-3 Cluster)
gb (Large Memory Capacity Server)

<HDDSIZE>[kb

Specify HDD size per process

|mb|gb]

default unit: gb

<hh:mm:ss> |
<sssss>

Specify running time (elapsed time)


format: hh(hours):mm(minutes):ss(seconds) or sssss(seconds)
default : refer to Table 5-5 Available hardware resource
Merge standard output and standard error output (default: not

-eo

merge)
(*) Invalid for interactive batch job
Output statistical information of the job to standard output
(default: not output)

-oi

(*) In case of interactiv batch job, output statistical information


to a file.
-mb

Send an email when a job starts (default: not send)

-me

Send an email when a job ends (default: not send)

-mu

email
address

-r

<REQNAME>

-rerun

[ Y | N ]

Email address (default: email address in the application form)


Specify a request name (default: script name or STDIN)
Specify if job restarts in case of trouble. Y: restart (default:N)
(*) Invalid for interactive batch job
Specify if subsequent jobs are deleted when the chain job ends

-chaindel

[ Y | N ]

abnormally. Y: delete (default: Y)


(*) Invalid for non chain job

-cwd

Move to a directory where a job is submitted when a job starts


(default: home directory)

Copyright (C) RIKEN, Japan. All rights reserved.


- 53 -

Specify a compiler that generated modules


[fj|intel|gcc|pgi|nvidia]

-comp[iler]

<COMPTYPE>

fj

Fujitsu compiler

intel

Intel compiler

gcc

GNU compiler

pgi

PGI compiler(using GPGPU)

nvidia

CUDA compiler (using GPGPU)

default: fj (except for Large Memory Capacity Server)


intel (Large Memory Capacity Server)
Specify parallel execution environment that linked modules
[fjmpi|xpf|mpt|pvm]

-para[llel]

<PARALLEL>

fjmpi

Fujitsu MPI

xpf

XPFortran

mpt

Message Passing Toolkit

pvm

Parallel Virtual Machine

default: fjmpi (except for Large Memory Capacity Server)


mpt
-project

<PRJ-ID>

(for Large Memory Capacity Server)

Specify a project ID (for users who have two or more projects)


Submit a batch job as specified number of bulk jobs.
(*) Invalid for interactive batch job

-B

<N>-<M>[:S]

<N> Start number of bulk job


<M> End number of bulk job
<S> Number of steps of bulk job(default: 1)
Send an email when a first bulk subjob starts (default: not

-bmb

send)

(*) Invalid for non bulk job

Send an email when all bulk subjobs start (default: not send)

-bmab

(*) Invalid for non bulk job


Send an email when a first bulk subjob ends (default: not send)

-bme

(*) Invalid for non bulk job


Send an email when all bulk subjobs end (default: not send)

-bmae

(*) Invalid for non bulk job


-fstype

[ftl | share]

Presence of FTL specification (default: share)

Table 5-3 Major options for job submit command

Copyright (C) RIKEN, Japan. All rights reserved.


- 54 -

5.1.3.2 Hardware resources


Hardware resources to be specified are as follows.
Hardware resource

Computing server system to use

-pc

PC Clusters (Massively Parallel Cluster, Multi-purpose Parallel Cluster)


(*)

-mpc

Massively Parallel Cluster, Multi-purpose Parallel Cluster)

-upc

Multi-purpose Parallel Cluster

-accel

Multi-purpose Parallel Cluster (GPGPU)

-accelex

Multi-purpose Parallel Cluster (GPGPU: 1node possession)

-ssc

Cluster for single jobs using SSD


Table 5-4 Hardware resource list

(*) If the pc option is specified and no software resource(e.g. g03, -adf and so on) is specified
when submitting a job to PC Clusters, the job is executed on either Massivelly Parallel Cluster or
Multi-purpose Parallel Cluster.
However, because home area(/home) and data area(/data) are not shared in Massively Parallel
Cluster, files are transferred by FTL (reffer to 6 FTL (File Transfer Language)) at the start/end time.
Therefore, the location of the output file is different between Massively Parallel Cluster and
Multi-purpose Parallel Cluster.

Example
#!/bin/sh
#MJS:
#MJS:
#MJS:
#MJS:

-pc
-proc 1
-eo
-cwd

#FTLDIR: $MJS_CWD
srun ./a.out > output.log
The output.log will be created as follows:
* Executed on Massively Parallel Cluster:

$MJS_CWD/REQUEST-ID/output.log.0

* Executed on Multi-purpose Parallel Cluster: $MJS_CWD/output.log


So, specify a hardware resource as follows:
* Execute a job on Massively Parallel Cluster: -mpc
* Execute a job on Multi-purpose Parallel Cluster: -upc
* Execute a job on Massively Parallel Cluster or Multi-purpose Parallel Cluster: -pc

Copyright (C) RIKEN, Japan. All rights reserved.


- 55 -

Number of cores, amount of memory and elapsed time depend on hardware resource.
Number
Hardware
resource
(*1)

-pc
(PC Clusters)

of

available

per process (*4)

Max

cores per job(*2)

elapsed

Quick

General

time to

amount of memory to

local disk size

Use

Use

specify(*3)

specify

to specify

1128

1128

72 H

129256

129512

24 H

[executed on Massively Parallel Cluster]


default 1,200MB
(max. 9,600MB)

default 40GB
(max.
320GB)

[executed on Multi-purpose Parallel


Cluster]
-

5133803

6H

2128

2128

72 H

129256

129512

24 H

Clusters)

5138192

6H

-upc/-accel
-accelex
(Multi-purpose

1128

1128

72 H

129256

129512

24 H

513800

6H

-mpc
(Massively
Parallel

Parallel Cluster

default 2,600MB
(max. 20,800MB)

default 1,200MB
(max. 9,600MB)

default 2,600MB
(max. 20,800MB)

default 10GB
(max. 80GB)
default 40GB
(max.
320GB)

default 10GB
(max. 80GB)

/GPGPU (*5) )
Table 5-5 Available hardware resource

Caution
(*1)

One hardware resource must be specified. (Two or more hardware resources cannot to be

specified.)
(*2)

Number of available cores is number of process x number of thread.

(*3)

If -time option is omitted when job is submitted, maximum elapse time is set according to the

number of cores assigned to the job


(*4)

Amount of memory per process can be specified up to maximum value according to hardware

resource in the table. When amount of memory more than default is specified, use computation
time based on a number of cores occupied according to specified amount of memory.
(Example) In case of executing 2 cores in parallel job which is specified 30GB amount of memory
per process on Large Memory Capacity Server
Specified amount of memory per process 30GB = default amount of memory 15GB x 2
It is equivalent to 2 cores amount of memory.
Computation time of 2 cores x 2 cores in parallel job = The job uses 4 cores computation time

Copyright (C) RIKEN, Japan. All rights reserved.


- 56 -

(*5)

On GPGPU, when -accelex is specified, the job is executed exclusively occupying a node per

process. So, computation time for cores of the nodes is used regardless of a specified number of
cores.
When -accel is specified, the amount of the occupation of the resource occupies one CPU (for
4cores).

(*6)

This cluster is 12 cores per node unlike other clusters. Pleae take care when you spacify a

parallel number.

Massively Parallel Cluster, Multi-purpose Parallel Cluster have 2 CPU (4 core / CPU) per computing
node. Jobs using 1 core share CPU. However, parallel jobs which specified two or more cores occupy
CPU (4 cores). Therefore, if a job occupies more cores than specified, computation time is used
accordingly.

Jobs using 1 core

CPU(4 core)

Jobs using 1 core share CPU.

core

A job using 2 cores

CPU(4 core)

Jobs of parallel core always occupy


CPU. Computation time is used

core

according to number of CPU.

not available to use

Copyright (C) RIKEN, Japan. All rights reserved.


- 57 -

5.1.3.3 Software resource


Software resources to be specified are as follows.
Software resource

Execution software

-g03

Gaussian03

-g09

Gaussian09

-g03nbo

NBO 5.G

-g09nbo

NBO 5.9

-g09nbo6

NBO 6.0

-adf

ADF2013.01

-adf2010

ADF2010.02

-gamess

GAMESS(socket)

-gamess_mpi

GAMESS(socket)

-amber8

Amber8

-amber10

Amber10

-amber11

Amber11

-amber12

Amber12

-amber14

Amber14

-ansys

ANSYS

-clustalw

ClustalW

-blast

BLAST

-hmmer

HMMER

-fasta

FASTA

-cluster3

CLUSTER 3.0

-qchem

Q-Chem 4.1

Computing server system to use


Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Large Memory Capacity Server
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Large Memory Capacity Server
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Massively Parallel Cluster
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Massively Parallel Cluster
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
MDGRAPE-3 Cluster
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster (GPGPU)
Multi-purpose Parallel Cluster
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD
Large Memory Capacity Server
Multi-purpose Parallel Cluster
Cluster for single jobs using SSD

Table 5-6 Software resource list

Copyright (C) RIKEN, Japan. All rights reserved.


- 58 -

Based on specified software resource, number of processes to specify or elapsed time to specify is
different from one according to hardware resource. Available resources to specify are as follows.
**
(
means the same value according to hardware resource as shown in Table 5-5 Available hardware
resource)
Software
resource
-g03
-g09
-g03nbo
-g09nbo
-g09nbo6
-adf
-adf2010
-gamess
-gamess_mpi
-amber10
-amber11
-amber12
-amber14

Hardware resource to
specify

Number of
processes to
specify (*1)

Number of
threads to
specify (*2)

Max.
Elapsed
time

Amount of
memory
to specify

-pc/-upc/-ssc

**

**

**

-pc/-upc/-ssc

**

**

**

-pc/-upc/-ssc

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

**

-pc/-mpc/-upc/
-ssc
-pc/-mpc/-upc/
-ssc
-pc/-upc/-ssc
-pc/-upc/-accel/
-accelex/-ssc
-pc/-upc/-accel/
-accelex/-ssc
-pc/-upc/-accel/
-accelex/-ssc

-clustalw

-pc/-upc/-ssc

**

**

**

-blast

-pc/-upc/-ssc

**

**

**

-hmmer

-pc/-upc/-ssc

**

**

**

-fasta

-pc/-upc/-ssc

**

**

**

-cluster3

-pc/-upc/-ssc

**

**

Table 5-7 Available software resource


---- Note ---(*1)

It is a number of processes generated for job execution. It is specified by "-proc" of qsub


option.

(*2)

It is a number of threads generated for job execution. It is specified by "-thread" of qsub


option.

(*3)

The number of ANSYS Solver license is 1. Therefore, only one job using ANSYS can be
executed at the same time.

Copyright (C) RIKEN, Japan. All rights reserved.


- 59 -

5.1.3.4 Other job properties


Job property

Meaning

How to specify
Specify 0 65535. Default is 100.

-pri

Priority to a job

The larger value is the higher priority.


Example: #MJS: -pri 10000
Format [[YYYY/]MM/DD-]HH:MM

-start_time

Time when a job starts (*1)

Example:
#MJS:-start_time 2009/10/01-09:00

Table 5-8 Job property list

Caution
(*1) If specified resources cannot be secured by specified time, status changes from WAIT(WIT) to
TIME OVER(TOV). Jobs in TOV status can be deleted but do not start.

5.1.3.5 Major options for job submission command


5.1.3.5.1 Specify a number of processes

-proc

<PROC-NO>

A number of cores specified by PROC-NO are allocated for processes for a job. If this option is omitted,
PROC-NO is set to 1. Specify a number of processes to execute a parallel job with interprocess
communication such as MPI program or XPFortran program.

Caution
If PROC-NO x THREAD-NO of "-thread <THREAD-NO>" at the next section exceeds the maximum
number of cores to specify, a job submission error occurs. Please specify a proper number of cores to
submit a job.

5.1.3.5.2 Specify a number of threads

-thread

<THREAD-NO>

A number of cores specified by THREAD-NO are allocated for threads for a job. If this option is omitted,
THREAD-NO is set to 1. Specify a number of threads to execute a parallel job generating threads.

Copyright (C) RIKEN, Japan. All rights reserved.


- 60 -

5.1.3.5.3 Specify amount of memory

-mem

<MEMSIZE>[kb|mb|gb]

Amount of memory per process specified by MEMSIZE is secured for job execution. Units which can be
specified are kb, mb or gb (default: mb (PC Clusters/MDGRAPE-3 Cluster), gb (Large Memory
Capacity Server)). Blank characters must not be put between MEMSIZE and a unit. If this option is
omitted, default of amount memory is set. (Please refer to Table 5-4 Hardware resource list.)
(Example 1)

-mem 800mb

-->

800MegaByte = 800 x 1024KiloByte = 800 x 1024 x 1024Byte

(Example 2)

-mem 8gb

-->

8GigaByte = 8 x 1024MegaByte = 8 x 1024 x 1024KiloByte

5.1.3.5.4 Specify HDD size (for PC Clusters, MDGRAPE-3 Cluster)

-hdd

<HDDSIZE>[kb|mb|gb]

HDD size per process is specified by HDDSIZE for job execution. Units which can be specified are kb,
mb or gb. Blank characters must not be put between HDDSIZE and a unit. This option is for users who
need large size of local disk area. If this option is omitted, default of local disk size is set.
(Example 1) -hdd 2000mb

2000MegaByte = 2000 x 1024KiloByte = 2000 x 1024 x 1024Byte

(Example 2) -hdd 10gb

10GigaByte = 10 x 1024MegaByte = 10 x 1024 x 1024KiloByte

5.1.3.5.5 Specify elapsed time

-time <ELAPSETIME>
Execute job within specified elapse time. When the job does not end within the elapse time, the job is
forcibly deleted. This prevents the job wasting resources when the job goes into an infinite loop, etc. If
this option is omitted, maximum elapse time is set according to the number of cores assigned to the job
(refer to 5.1.3.2 Hardware resources Table 5-5 Available hardware resource). Elapsed time specified by
ELAPSETIME is set in format of HH:MM:SS (HH: hour, MM: minute, SS: second) or SSSSS (SSSSS:
second).
(Example 1)

-time 24:10:10

-->

24 hours 10 minutes 10 seconds

(Example 2)

-time 3600

-->

3600 seconds

(Example 3)

-time 59:01

-->

59 minutes and 1 second

[Backfill function]
The job scheduler determines priorities of jobs based on the users' usage of resources and determines
the job which starts next. However, the job scheduler starts other low-priority jobs so long as they don't
delay the highest priority job by backfill function. Therefore, it is possible that job starts earlier if proper
ELAPSETIME is specified for the job.
Copyright (C) RIKEN, Japan. All rights reserved.
- 61 -

5.1.3.5.6 Merge standard output and standard error output

-eo
Merge a standard error output file with a standard output file. If this option is omitted, a standard error
output file and a standard output file are generated separately.

5.1.3.5.7 Send email at start of job

-mb
At the start of the job, an email is sent to address in the application form.

5.1.3.5.8 Send email at end of job

-me
At the end of the job, an email is sent to address in the application form.

5.1.3.5.9 Specify Request name

-r

<REQNAME>

Execute a job as request name REQNAME. If this option is omitted, request name of a job is script file
name. Blank character must not be included in request name.

5.1.3.5.10 Specify if subsequent jobs are deleted when the chain job ends abnormally

-chaindel [Y|N]
Specify if subsequent jobs are deleted when the chain job ends abnormally (Y: delete, N: not delete).

Caution
A) If the option is omitted, default is chaindel Y (delete subsequent jobs). However, if rerun Y
(rerun the job) is specified, default is chaindel N (execute subsequent jobs).
B) Job submission with both rerun Y and chaindel Y (delete subsequent jobs) fails with the
following error message.
qsub: ERROR: 0016: invalid options: cannot enable -chaindel Y and -rerun Y at the same time.
C) This option is valid for chain job. It is ignored for non chain job.
D) It is possible to submit jobs which are specified chaindel and jobs which are not specified
chaindel as a chain job.
Copyright (C) RIKEN, Japan. All rights reserved.
- 62 -

5.1.3.5.11 Move to directory where job is submitted when job starts

-cwd
A Job executes the script in the home directory by default. Specify cwd option to execute the script in
the directory where the job is submitted.

5.1.3.5.12 Specify project number

-project

<PROJECT-NO>

Specify a project number for job execution. Project numbers are ID issued when the applications are
permitted by the administrator. (Users who have one project number don't need this option. This option
is for users who have two or more projects.)
A default project number can be specified by the variable MJS_QSUB_PROJECT in the .cltkrc (CLTK
user configuration file) located in home directory.
Example) Edit the .cltkrc file
[username@ricc1:~] vi $HOME/.cltkrc
MJS_QSUB_PROJECT = G00001

<--

Specify a default project number

5.1.3.5.13 Submit bulk jobs

-B

<N>-<M>[:S]

Submit a batch job as bulk jobs. Specify a range of Bulk Index ID by <N>-<M>. Steps of Bulk Index ID
can be specified by S. please refer to 5.1.1.3 Submit bulk job.
Example) submit 50 sub jobs as bulkjob
[username@ricc1:~] qsub B 1-50 go.sh
Bulk Request 5381290.jms Submitted to MJS.
[username@ricc1:~] qstat
REQID

NAME

STAT

ELAPSE START-TIME

CORE

-------------------------------------------------------------------5381290[1].jms

go.sh

RUN

00:00 06/09 14:37

5381290[2].jms

go.sh

RUN

00:00 06/09 14:37

5381290[3].jms

go.sh

RUN

00:00 06/09 14:37

5381290[4].jms

go.sh

RUN

00:00 06/09 14:37

5381290[5].jms

go.sh

RUN

00:00 06/09 14:37

5381290[6].jms

go.sh

RUN

00:00 06/09 14:37

5381290[7].jms

go.sh

RUN

00:00 06/09 14:37

5381290[8-50].jms go.sh

QUE

--:-- --/-- --:--

Copyright (C) RIKEN, Japan. All rights reserved.


- 63 -

Example) submit 13 sub jobs as bulkjob with step number


[username@ricc1:~] qsub B 1-25:2 go.sh
Bulk Request 5381341.jms Submitted to MJS.
[username@ricc1:~] qstat
REQID

NAME

STAT

ELAPSE START-TIME

CORE

-------------------------------------------------------------------5381341[1].jms

go.sh

RUN

00:00 06/09 14:37

5381341[3].jms

go.sh

RUN

00:00 06/09 14:37

5381341[5].jms

go.sh

RUN

00:00 06/09 14:37

5381341[7].jms

go.sh

RUN

00:00 06/09 14:37

5381341[9].jms

go.sh

RUN

00:00 06/09 14:37

5381341[11].jms

go.sh

RUN

00:00 06/09 14:37

5381341[13].jms

go.sh

RUN

00:00 06/09 14:37

--:-- --/-- --:--

5381341[15,17,19,21,23,25].jms go.sh

QUE

5.1.3.5.14 Send email at start of first bulk subjob

-bmb
When any one bulk subjob starts, an email is sent to address in the application form.

5.1.3.5.15 Send email at start of all bulk subjobs

-bmab
When all bulk subjobs start, an email is sent to address in the application form.

5.1.3.5.16 Send email at end of first bulk subjob

-bme
When any one bulk subjob ends, an email is sent to address in the application form.

5.1.3.5.17 Send email at end of all bulk subjobs

-bmae
When all bulk subjob end, an email is sent to address in the application form.

Copyright (C) RIKEN, Japan. All rights reserved.


- 64 -

5.1.3.6 Execution command


Batch jobs and interactive batch jobs execute execution commands specified after job submission
options and resources to use.

5.1.3.6.1 Execution command for PC Clusters


For PC Clusters, following commands are available.
Execution command (*1)

Meaning

srun

Serial program
Thread parallel program (maximum number of threads: 8)

mpirun

MPI parallel program, Hybrid parallel program (MPI + thread)

xpfrun

XPFortran parallel program

---- Note ---(*1) Options of execution commands are not necessary since number of cores (MPI parallel or thread
parallel) and resources to use are already specified at job submission.

Example 1)
srun
Example 2)
mpirun
Example 3)
xpfrun

Execute a serial program


./serial.out
Execute an MPI parallel program
./mpi.out
Execute an XPFortran program
./xpf.out

5.1.3.6.2 Execution command for MDGRAPE-3 Cluster


The commands of PC Clusters are available.

5.1.3.6.3 Execution command for Large Memory Capacity Server


For Large Memory Capacity Server, following commands are available.
Execution command (*1)

Meaning

srun

Serial program
Thread parallel program (maximum number of threads: 32)

mpirun

MPI parallel program, Hybrid parallel program (MPI + thread)

---- Note ---(*1)Options of execution commands are not necessary since number of cores (MPI parallel or
thread parallel) and resources to use are already specified at job submission.

Copyright (C) RIKEN, Japan. All rights reserved.


- 65 -

Example 1)
srun
Example 2)
mpirun

Execute a serial program


./serial.out
Execute an MPI parallel program
./mpi.out

Copyright (C) RIKEN, Japan. All rights reserved.


- 66 -

5.1.3.7 Script file for Batch job


Create scripts by vi or emacs etc. to submit batch jobs. Script files except resource name (hardware
resources and software resources) are available to use among all the computing server systems.

5.1.3.7.1 Script for PC Clusters


Since Massively Parallel Cluster cannot access home area, transfer execution files to computing
nodes' local disk area in advance of job execution by specifying file transfer function (FTL) in scripts.
On FTL, please refer to 6

FTL (File Transfer Language).

We explain a script for a job which needs following resources.


- Number of processes (cores)

: 8 cores

- Amount of memory

: 1200MB

- Elapsed time

: 10 H

- Merge standard error output and standard output

: Yes

- Restart the job in case of trouble

: Yes

- Move to directory where job is submitted

: Yes

[username@ricc1:~] vi go-pc.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -pc

Specify hardware resource

#MJS: -proc 8

Specify a number of processes

#MJS: -mem 1200mb

Specify amount of memory

#MJS: -time 10:00:00

Specify elapsed time

#MJS: -eo

Merge standard output / error output

#MJS: -rerun Y

Restart job in case of trouble

#MJS: -cwd

Move to directory where job is submitted

#------- FTL command -------#


#FTLDIR: $MJS_CWD

Specify file transfer (Note)

#------- Program execution -------#


mpirun ./para.out

Execute job

Table 5-9 Script for PC Clusters


Don't change "#!/bin/sh" in the first line and "#MJS:" for job options in 3rd to 9th line because these
have special meanings.

Copyright (C) RIKEN, Japan. All rights reserved.


- 67 -

Note) Files in the directory where the job is submitted are transferred to computing nodes' local disk
area automatically in advance of job execution. Also, files in the directory on computing nodes are
collected after job execution. Except for Massively Parallel Cluster, this command is ignored.
When FTLDIR is used, unnecessary files may be transferred. Also, existence of files is checked after
job execution even though there is no file to be transferred. For large scale parallel jobs, as the costs
may be high, please use BEFORE and AFTER instead of FTLDIR.
For more information on BEFORE and AFTER, please refer to 6 FTL (File Transfer Language).

5.1.3.7.2 Script for Large Memory Capacity Server


We explain a script for a job which needs following resources.
- Number of processes (cores)

: 8 cores

- Amount of memory

: 30GB

- Elapsed time

: 10 H

- Merge standard error output and standard output

: Yes

- Restart the job in case of trouble

: Yes

- Move to directory where job is submitted

: Yes

[username@ricc1:~] vi go-ax.sh
#!/bin/sh
#------ qsub option --------#
#MJS: -ax

Specify hardware resource

#MJS: -proc 8

Specify a number of processes

#MJS: -mem 30gb

Specify amount of memory

#MJS: -time 10:00:00

Specify elapsed time

#MJS: -eo

Merge standard output / error output

#MJS: -rerun Y

Restart job in case of trouble

#MJS: -cwd

Move to directory where job is submitted

#------- FTL command -------#


#FTLDIR: $MJS_CWD
#------- Program execution -------#
mpirun ./para.out

Execute job

Copyright (C) RIKEN, Japan. All rights reserved.


- 68 -

Table 5-10 Script for Large Memory Capacity Server


Don't change "#!/bin/sh" in the first line and "#MJS:" for job options in 3rd to 9th line because these have
special meanings.

Copyright (C) RIKEN, Japan. All rights reserved.


- 69 -

5.1.4 Confirm job information


Use the qstat command to confirm job information.

qstat [-d|-m|-p|-e|-w|-project] [REQID]


Option

Meaning

(none)

Display job information

-d

In addition to job information, display directory where a job is submitted

-m

In addition to job information, display using amount of memory

-p

In addition to job information, display priority

-e

Display completion information

-w

Display reason for waiting

-project

Display job of specified project (for users who have two or more projects)

5.1.4.1 qstat command


Display user's own job list submitted currently.
[username@ricc1:~] qstat
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

NAME

STAT

ELAPSE START-TIME

CORE

------------------------------------------------------------------12342.jms

go.sh

RUN

12:34 07/28 12:00

12348.jms

go.sh

QUE

--:-- --/-- --:--

12412[1].jms

bulk.sh

RUN

12:04 07/28 12:30

12412[2].jms

bulk.sh

RUN

12:04 07/28 12:30

12412[3-10].jms

bulk.sh

REQID :

QUE

--:-- --/-- --:--

REQUEST-ID
* Bulk job (running): "Bulk ID"."Bulk Index ID"
* Bulk job (waiting): "Bulk ID".["start of Bulk Index ID"-"end of Bulk Index ID"]

NAME:

REQUEST-NAME (If omitted, request name is script file name)

STAT:

Batch job status


( RUN: running, QUE: waiting to run, WIT: waiting to specified start time,
HLD: hold(*1), END:end(*2), TOV:time over to specified start time(*3))

ELAPSE:

Elapsed time (HH:MM)

START-TIME:

Time when the job starts (MM/DD HH:MM)

CORE:

Number of allocated cores

(*1): State where the job is prevented from starting


(*2): Only for coupled calculation jobs
(*3): State where the job is held because job cannot start at specified start time.
Copyright (C) RIKEN, Japan. All rights reserved.
- 70 -

5.1.4.2 qstat command (directory where job is submitted)


With d option, directory where job is submitted is displayed in addition to job information.
[username@ricc1:~] qstat -d
[Q00001]

Study of massively parallel programs on RIKEN Cluster....

REQID

NAME

STATUS

ELAPSE START-TIME

CORE SUBMIT-DIR

-------------------------------------------------------------------12342.jms

go.sh

RUN

12:34 07/28 12:00

$HOME/JOB

12348.jms

go.sh

QUE

--:-- --/-- --:--

$HOME/JOB

bulk.sh

RUN

12:04 07/28 12:30

1 $HOME/JOB

12412[2-10].jms bulk.sh

QUE

--:-- --/-- --:--

12412[1].jms

$HOME/JOB

[Meaning of additional information]


SUBMIT-DIR:

Directory where job is submitted

5.1.4.3 qstat command (amount of memory in use)


With m option, memory information is displayed in addition to job information. Memory information is
updated periodically. Maximum amount of memory in use in all processes is displayed.
[username@ricc1:~] qstat m
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

NAME

STATUS

ELAPSE

START-TIME

CORE

MEMORY

-------------------------------------------------------------------12342.jms

go.sh

RUN

12:34 07/28 12:00

500M

12348.jms

go.sh

QUE

--:-- --/-- --:--

--

[Meaning of additional information]


MEM:

Maximum amount of memory in use

5.1.4.4 qstat command (priority)


With -p option, priority is displayed in addition to job information.
[username@ricc1:~] qstat p
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

NAME

STAT

ELAPSE START-TIME

CORE

PRI

-------------------------------------------------------------------12342.jms

go.sh

RUN

12:34 07/28 12:00

100

12348.jms

go.sh

QUE

--:-- --/-- --:--

100

[Meaning of additional information]


PRI:

Job priority

Copyright (C) RIKEN, Japan. All rights reserved.


- 71 -

5.1.4.5 qstat command (finished job)


With -e option, list of finished jobs is displayed.
[username@ricc1:~] qstat -e
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

NAME

STATTIME

ENDTIME

CORE

MEM(*) SUBMIT-DIR

-------------------------------------------------------------------12321.jms

go.sh

07/21 08:00 07/28 14:21

896

500M

4649.ax

go.sh

07/25 12:40 07/28 16:04

20G

12324.jms

go.sh

07/28 12:00 07/28 13:09

896

500M

$HOME/JOB1
$HOME/axjob
$HOME/JOB

12412[1].jms bulk.sh 07/28 12:00 07/28 13:09

500M

$HOME/JOB

12412[2].jms bulk.sh 07/28 12:00 07/28 13:09

500M

$HOME/JOB

(*) If amount of memory cannot be obtained. "-" is displayed.

Copyright (C) RIKEN, Japan. All rights reserved.


- 72 -

5.1.4.6 qstat command (reason for waiting)


For jobs of top 10 priority which are waiting to run, display reasons and estimated time when the jobs
start. Estimated time may be different from actual start time because it depends on other jobs execution
situation, etc.
[username@ricc1:~] qstat -w
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

NAME

STAT

CORE

MEM

ESTIMATE

REASON

-------------------------------------------------------------------13574.jms

go.sh

QUE

1024

--

< 6hrs

Insufficient cores

13575.jms

sim.sh

QUE

--

> 24hrs

Insufficient license

4695.ax

go.sh

QUE

16

--

< 12hrs

Insufficient memory

13577.jms

go.sh

QUE

1024

--

> 24hrs

Insufficient cores

13577.jms

go-1.sh

QUE

1024

--

> 3days

Insufficient cores

13577.jms

go-2.sh

QUE

1024

--

< 3days

Chain job

*****************************************************************************
The estimation time is transitorily changed by the job execution or submission.
*****************************************************************************

[Meaning of additional information]


MEM:

Specified amount of memory (If not specified, "-" (hyphen) is displayed)

ESTIMATE:

Estimated time when the job starts

REASON:

< 6hrs

The job will start in 6 H.

<12hrs

The job will start in 12 H.

<24hrs

The job will start in 24 H.

< 3days

The job will start in 3days.

> 3days

The job will end after 3days.

Reason for waiting to run


Insufficient cores

Cores are insufficient

Insufficient memory

Amount of memory is insufficient

Insufficient license

License is insufficient

Otherjob booking cores

Other jobs inhibit the job to run

Chain job

Chain job inhibits the job to run

Upper limit of project

The limit on number of cores the project could


use at a time inhibits the job to run.

Over specified start time

Job cannot run because of over specified


start time

Copyright (C) RIKEN, Japan. All rights reserved.


- 73 -

5.1.4.7 qstat command (project number)


Display jobs of specified project number. This function is only available for users who have two or more
projects.
[usernane@ricc ~] qstat project G00001
[G00001] Research of RICC

<-- Display project number (G00001)

REQID

ELAPSE START-TIME

NAME

STAT

CORE

--------------------------------------------------------------------12342.jms

go.sh

RUN

12:34

07/28 12:00

12348.jms

go.sh

QUE

--:--

--/-- --:--

5.1.5 Display standard output / standard error output


Use the qcat command to display submitted script files, running jobs' standard output file or standard
error output file.

qcat

[-o|-e|-s]

REQID

Option

Meaning

(none)

Display running job's standard output file.

-e

Display running job's standard error output file.

-o

Display running job's standard output file.

-s

Display job's script file.

5.1.6 Confirm resource information


Display information on usage of the system or available resources.

qstat

[-x|-uc|-um]

Option

Meaning

-x

Display available resources and limit.

-uc

Display usage of cores in the system.

-um

Display usage of memory on Large Memory Capacity Server.

Copyright (C) RIKEN, Japan. All rights reserved.


- 74 -

5.1.6.1 Display resource information


Display hardware and software resources which the user can specify for each project.
[username@ricc1:~] qstat -x
[Q00001] Study of massively parallel programs on RIKEN Cluster....
H_RESOURCE MAX_CORE/J MAX_CORE/P SUBMIT ELAPSE MEMORY
RUN
QUEUED
----------------------------------------------------------------------pc
256
10/500
0( 189) 0( 206)
+- mpc
256
72H 10240mb 10( 180)
+- upc
256
72H 21200mb
0(
6)
+- ssc
96
72H 43200mb
0(
0)
S_RESOURCE[pc(mpc)] MAX_PROC/J
MAX_THREAD/J
ELAPSE
MEMORY
----------------------------------------------------------------------amber14
1
g09
1
g09nbo
1
g09nbo6
1
gamess
gamess_mpi
S_RESOURCE[pc(upc)] MAX_PROC/J
MAX_THREAD/J
ELAPSE
MEMORY
----------------------------------------------------------------------adf
1
adf2010
1
adf2013
1
adf2014
1
amber10
1
amber11
1
amber12
1
amber14
1
blast
1
clustalw
1
cluster3
1
1
fasta
g03
1
g03nbo
1
g09
1
g09nbo
1
g09nbo6
1
gamess
hmmer
1
visit
160
8
24H
visitgpu
S_RESOURCE[pc(ssc)] MAX_PROC/J
MAX_THREAD/J
ELAPSE
MEMORY
-----------------------------------------------------------------------adf
1
adf2010
1
adf2013
1
adf2014
1
amber10
1
amber11
1
amber12
1
amber14
1
blast
1
clustalw
1
cluster3
1
1
Copyright (C) RIKEN, Japan.
All rights reserved.
- 75 -

Item

Meaning

H_RESOURCE

Hardware resource

MAX_CORE/J

Max. number of cores to specify per job (default value in


parenthesis)

MAX_CORE/P

Max. number of cores the project can use at a time

SUBMIT

Number of Submitted jobs / Max. number of jobs the project can


submit

ELAPSE

Max. elapsed time to specify per job

MEMORY

Max. amount of memory to specify per job (default value in


parenthesis)

RUN

Number of running jobs of the project (total value in parenthesis)

QUEUED

Number of waiting jobs of the project (total value in parenthesis)

S_RESOURCE[pc(mpc)]

Available software resources for Massively Parallel Cluster

S_RESOURCE[pc(upc)]

Available software resources for Multi-purpose Parallel Cluster

S_RESOURCE[pc(accel)]

Available

software

resources

for

Multi-purpose

Parallel

resources

for

Multi-purpose

Parallel

Cluster(GPU)
S_RESOURCE[pc(accelex)
]

Available

software

Cluster(GPU)

MAX_PROC/J

Max. number of processes to specify per job

MAX_THREAD/J

Max. number of threads to specify per job

Application_NAME

Software resource that has limitation on number of use.

USE/MAX

Number of using cores / Max. number of available cores

5.1.6.2 Display usage of core


Display usage of cores in the system.
[username@ricc1:~] qstat -uc
The status of CORE use

RATIO(USED/ALL)

-----------------------------------------------------------------------mpc

*********************************-------

84.9%(3232/3888)

upc

****************************************

100.0%(0800/0800)

ssc

*********************-------------------

54.6%(0118/0216)

Display current usage of cores in each system. RATIO(USED/ALL) means ratio of use (%), number
of cores in use and max. number of cores.

Copyright (C) RIKEN, Japan. All rights reserved.


- 76 -

5.1.7 Confirm project user job information


Use prjstat command to confirm job lists of user belongs to same project.

prstat

[-r]

Option

Meaning

-r

Sort request id

5.1.7.1 prjstat command


Display same project user's job list submitted currently.
Example : Display job list of users who belongs to project G00001
[username@ricc1:~] prjstat
[G00001] Research of RICC
USER

REQID

NAME

STAT

ELAPSE START-TIME

CORE

---------------------------------------------------------------------userA

1234567.jms

go.sh

RUN

01:40

02/15 13:40

128

userB

1234599.jms

test.sh

RUN

01:46

02/15 13:34

userC

1234600.jms

run.sh

QUE

--:--

--/-- --:--

256

userC

1234601.jms

run.sh

QUE

--:--

--/-- --:--

512

userD

1234301.jms

run-d.sh

RUN

49:40

02/13 13:34

32

Item

Meaning

USER

Username submitted jobs

5.1.7.2 prjstat command (display job list in order of request ID)


With -p option, job list is displayed in order of request ID.
[username@ricc1:~] prjstat -r
[G00001] Research of RICC
REQID

NAME

USER

STAT

ELAPSE START-TIME

CORE

---------------------------------------------------------------------1234301.jms

run-d.sh

userD

RUN

49:40

02/13 13:34

32

1234567.jms

go.sh

userA

RUN

01:40

02/15 13:40

128

1234599.jms

test.sh

userB

RUN

01:46

02/15 13:34

1234600.jms

run.sh

userC

QUE

--:--

--/-- --:--

256

1234601.jms

run.sh

userC

QUE

--:--

--/-- --:--

512

Copyright (C) RIKEN, Japan. All rights reserved.


- 77 -

5.1.8 Operate job


5.1.8.1 Operate file of running job
Operate files in the local disk area of computing nodes. So, these commands are only for jobs running
on Massively Parallel Cluster.

5.1.8.1.1 Display file list on computing node


Use the qls command to display running job's file list on computing nodes' local disk area.

qls

REQID [@RankNO]

[OPTION]

The ls command's options are available as OPTION.


Specify REQUEST-ID and process number (@RankNO) to display file list in the job execution directory.
Example: Display files in the job execution directory of rank 0 of REQUEST-ID 13562.jms
[username@ricc ~] qls 13562.jms@0
result_file
exec.tar.gz
go.sh

5.1.8.1.2 Get file of running job


Use the qget command to transfer files of running job on computing nodes' local disk area.

qget

REQID[@RankNO] SRC ..

[DEST]

From Login Server, get specified REQUEST-ID and process (@RankNO) 's files on computing nodes'
local disk area to home area.
Example) Get a file (resultfile) of rank 0 of a running job (REQUEST-ID 13579.jms)
[username@ricc ~]$ qls 13579.jms -l

Confirm files in the job

total 48
-rwxr-xr-x 1 username group 46555 Jul

22 13:45 resultfile

[username@ricc ~]$ qget 13579.jms@0 result /tmp


[username@ricc ~]$ ls /tmp/result

Execute qget command

Confirm the files are transferred

/tmp/result

Copyright (C) RIKEN, Japan. All rights reserved.


- 78 -

5.1.8.1.3 Put result file


Use the qput command to put files on computing nodes' local disk area to other nodes or home area

qput [OPTION] [@SRC_Rank:] SRC DEST


qput
[@SRC_Rank:] SRC @DEST_Rank_LIST

Option

Meaning

-del

Delete source files after putting files.

The qput command can be invoked from script.


Example) Put resultfile to the directory where the job is submitted during the job execution.
[usernane@ricc ~] vi go-qput.sh
#!/bin/sh
#------ Option Set for qsub command --------#
#MJS: -pc
#MJS: -proc 8
#MJS: -time 10:00:00
#MJS: -rerun Y
#MJS: -cwd
mpirun ./a.out > resultfile
qput resultfile $MJS_CWD

Copy to the directory where the job is submitted

mpirun ./b.out

Copyright (C) RIKEN, Japan. All rights reserved.


- 79 -

5.1.8.2 Cancel job


Use the qdel command to cancel a job. Use the qd command to cancel two or more jobs interactively.

qdel
qd

[-K|-collect]
[-K|-collect]

REQID

Option

Meaning

(none)

Cancel a job

-K

Cancel a job and delete standard output / error output file (except for Large
Memory Capacity Server).

-collect

Cancel a job and collect files on computing nodes (except for Massively
Parallel Cluster).

5.1.8.2.1 Example of qdel command


Confirm REQUEST-ID of job to delete.
[username@ricc1:~] qstat
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

NAME

STAT

ELAPSE

START-TIME

CORES

-------------------------------------------------------------------12342.jms

go.sh

RUN

12:34

12348.jms

go.sh

RUN

0:20

12356.jms

go.sh

RUN

0:05

07/28

12:00

07/28- 00:14

07/28- 00:29

12412[1].jms

bulk.sh

RUN

0:05

07/28

00:29

12412[2].jms

bulk.sh

RUN

0:05

07/28

00:29

12348[3-10].jms bulk.sh

QUE

--:--

--/--

--:--

Specify job's REQUEST-ID (REQID) as argument of the qdel command.


[usernane@ricc ~] qdel

12348.jms

Request 12348.jms has been deleted.


[usernane@ricc ~] qdel

5963.ax

Request 5963.ax has been deleted.


If standard output / error output are not necessary on PC Clusters, specify K option. This is
especially specified to cancel the job in EXITING status.
[usernane@ricc ~] qdel

-K

12342.jms

Request 12342.jms has been deleted.


Copyright (C) RIKEN, Japan. All rights reserved.
- 80 -

Specify collect option to cancel the job and collect files the job generated on computing nodes of
PC Clusters.
[usernane@ricc ~] qdel

-collect

12356 .jms

Request 12356.jms is running, and has been signalled.


Specify Bulk ID to delete a whole bulk job.
[username@ricc1:~] qdel

12412.jms

Bulk Request 12412.jms has been deleted.


Specify "Bulk ID"."Bulk Index ID" to delete a bulk job individually.
[username@ricc1:~] qdel

12412[1].jms

Request 12412[1].jms has been deleted.


Specify "Bulk ID"["Bulk Index ID"] to delete 2 bulk jobs or more at the same time.
[username@ricc1:~] qdel

12412[1,3,5-10].jms

Bulk Request 12412[1,3,5-10].jms has been deleted.

5.1.8.2.2 Example of qd command


Display submitted job list by the qd command.
[username@ricc1:~] qd
NO

REQID

NAME

STAT

ELAPSE

START-TIME

CORES

------------------------------------------------------------------1

12342.jms

go.sh

RUN

12:34

07/28

12:00

12348.jms

go.sh

RUN

0:20

07/28

00:14

3 12412[1].jms

bulk.sh

RUN

0:05

07/28

00:29

4 12412[2].jms

bulk.sh

RUN

0:05

07/28

00:29

5 12348[3-10].jms bulk.sh

QUE

--:--

--/--

--:--

qd: input NO:

Enter NO of the job to cancel. If two or more jobs are to be cancelled, specify them separated by
comma (,) or using hyphen (-). If all jobs are to be cancelled, enter "all". Enter "q" or "quit" to quit
the qd command.
qd: input NO: 2

qd: Are you sure? (yes|no)? yes


Request 12348.jms is running, and has been signalled.
Request 5963.ax has been deleted
qd: Normal end
On the qd command, running bulk jobs are handled as a job individually and waiting bulk jobs are
handled as a job for a whole.
Copyright (C) RIKEN, Japan. All rights reserved.
- 81 -

5.1.8.3 Delete completed job information


Specify e option to delete completed job information.

qdel
qd

-e
-e

REQID

5.1.8.3.1 Example of qdel command


Display list of completed jobs.
[username@ricc1:~] qstat -e
[Q00001] Study of massively parallel programs on RIKEN Cluster....
REQID

REQNAME

START-TIME

END-TIME

CORES

MEM

SUBMIT-DIR

-----------------------------------------------------------------12321.jms

go.sh

07/21 08:00

07/28 14:21

896

500M

$HOME/JOB1

12324.jms

go.sh

07/28 12:00

07/28 13:09

896

500M

$HOME/JOB

Specify REQUEST-ID (REQID) of job to delete it from list of completed jobs.


[usernane@ricc ~] qdel

-e

12321.jms

Request 12321.jms was deleted from jobhistory-file.

(*) 500 completed jobs are preserved.


When a number of ended jobs is 500 or more, oldest job is deleted.

Copyright (C) RIKEN, Japan. All rights reserved.


- 82 -

5.1.8.3.2 Example of qd command


Specify e option as option of the qd command.
[username@ricc1:~] qd -e
NO

REQID

REQNAME

START-TIME

END-TIME

CORES

MEM

SUBMIT-DIR

----------------------------------------------------------------------1 12321.jms

go.sh

07/21 08:00

07/28 14:21

2 12322.jms

go.sh

07/25 12:40

07/28 16:04

3 12324.jms

go.sh

07/28 12:00

07/28 13:09

896
8
128

500M

$HOME/JOB1

20G

$HOME/JOB2

1.2G

$HOME/JOB3

Enter NO of the job to cancel. If two or more jobs are to be cancelled, specify them separated by
comma (,), blank character or hyphen (-). If all jobs are to be cancelled, enter "all". Enter "q" or
"quit" to quit the qd command.

qd: input NO: 1

qd: Are you sure? (yes|no)? yes


Request 12321.jms was deleted from jobhistory-file.
Request 4649.ax was deleted from jobhistory-file.
qd: Normal end

5.1.8.4 Alter priority of job


Alter priority of submitted job.

qalter -p

<PRIORITY>

<REQID>

Specify priority from 0 to 65535 following p option. (default: 100)


The priority of submitted bulkjob can not be changed.
[usernane@ricc ~] qalter

-p

200

12343.jms

Request 12343.jms was changed to priority(200).

Copyright (C) RIKEN, Japan. All rights reserved.


- 83 -

5.2 Interactive Job


For interactive job, there are limits as follows:

PC Clusters

Number of

Number of

Max.

Amount of

processes to

threads to

Elapsed time to

memory to

specify

specify

specify

specify

32

4 hour

2GB

5.2.1 Interactive Job execution for PC Clusters


Use following commands on Login Server to execute interactive job for PC Clusters.
Execution command

Meaning

srun

Serial program
Thread parallel program (max. number of threads: 8)

mpirun

MPI parallel program (max. number of processes: 32)

xpfrun

XPFortran parallel program (max. number of processes: 32)

If programs are executed not using above commands, the programs are executed on Login Server and
this may cause adverse effect on the system. When executing interactive jobs, above commands need
to use. Also, to execute programs of script language such as Perl or Python as interactive jobs, please
specify pc option. Furthermore, if a job requires input from keyboard, please specify pty to make
buffering of standard output off.

example 1)

Execute serial program (execution module) (buffering of standard output is off)

[username@ricc1:~] srun pty ./serial.out


example 2)

Execute serial program (script)

[username@ricc1:~] srun pc ./serial.pl


example 3)

Execute thread parallel program by 4 threads

[username@ricc1:~] srun thread 4 ./thread.out


example 4)

Execute MPI parallel program by 4 processes

[username@ricc1:~] mpirun np 4 ./mpi.out


example 5)

Execute XPFortran parallel program by 4 processes

[username@ricc1:~] xpfrun np 4 ./mpi.out


Also, ISV applications such as Gaussian, ADF, ANSYS(solver) and etc. cannot be executed by
interactive jobs. Please execute them by batch jobs.

Copyright (C) RIKEN, Japan. All rights reserved.


- 84 -

6. FTL (File Transfer Language)


6.1 Introduction
In RICC, job execution area is different among clusters. For Multi-purpose Parallel Cluster, shared area
is used for variable job execution. For Massively Parallel Cluster, local area of computing nodes is used
to realize fast I/O and reduce as much access load as possible. Therefore, necessary files for job need
to be transferred from home area of Login Server to computing nodes in advance of job running. Also,
computation results need to be transferred from computing nodes to home area of Login Server after
jobs finish. FTL (File Transfer Language) is used for file transfer. FTL commands are embedded within
a script file for job execution or generated to execute ftlgen command (refer to 6.7 FTL generating tool :
ftlgen).
In addition, each process of parallel program cannot be controlled using files since computing nodes
don't share files with each other.

Please specify FTL option in the job scripts. (default: share)


ex1) File transfer by using FTL
#MJS: -fstype ftl
ex2) Do not use FTL option
#MJS: -fstype share

Copyright (C) RIKEN, Japan. All rights reserved.


- 85 -

6.2 Transfer input file


To transfer input files, specify one or more following items.

Input files

RANK-LIST and Computing node's directory (optional)

Specify name of files to transfer as "Input files", rank (0 to number of processes -1) as "RANK-LIST"
and name of computing node's directory as "Computing node's directory".

If "RANK-LIST and Computing node's directory" is not specified, specified input files are transferred to
computing nodes in the same directory configuration of Login Server.

Input files: /home/username/job/input


Rank: All ranks (Number of processes: n)

Login Server

computing node
rank: 0

.....

rank: n-8

..

..
rank: 7

computing node

rank: n-1

shared area

local area

local area

Dir:/home/username/job

Dir:/home/username/job

Dir:/home/username/job

File:input

File:input

File:input

Fig. 6-1 Transfer input files from Login Server to computing nodes

Copyright (C) RIKEN, Japan. All rights reserved.


- 86 -

6.3 Transfer input directory


To transfer input directories, specify one or more following items.

Input directories

RANK-LIST and Computing node's directory (optional)

Specify name of directories to transfer as "Input directories", rank (0 to number of processes -1) as
"RANK-LIST" and name of computing node's directory as "Computing node's directory".
All files in the specified input directories are transferred.

If "RANK-LIST and Computing node's directory" is not specified, specified input directories are
transferred to computing nodes in the same directory configuration of Login Server.

Input directory: /home/username/job/bin


Rank: All ranks (Number of processes: n)

Login Server

computing node
rank: 0

.....

rank: n-8

..

..

rank: 7

computing node

rank: n-1

shared area

local area

local area

Dir: /home/username/job/bin

Dir:/home/username/job/bin

Dir:/home/username/job/bin

Fig. 6-2 Transfer input directory from Login Server to computing nodes

Copyright (C) RIKEN, Japan. All rights reserved.


- 87 -

6.4 Transfer output file


To transfer output files, specify one or more following items.

Output files

RANK-LIST and Login Server's directory (optional)

Specify name of output files to transfer as "Output files", rank (0 to number of processes -1) as
"RANK-LIST" and name of Login Server's directory as "Login Server's directory".

If "RANK-LIST and Login Server's directory" is not specified, specified output files are transferred to
Login server in the same directory configuration of computing nodes.

Output file: /home/username/job/output


Rank: All ranks (Number of processes: n)

Login Server

rank: 0

.....

rank: n-8

..

..
rank: 7

shared
area
NFS

computing node

computing node

rank: n-1

local area

local area

Dir:/home/username/job

Dir:/home/username/job

Dir:/home/username/job

File:output.0

File:output.0

File:output.n-1

Dir:/home/username/job
File:output.n-1

Fig. 6-3 Transfer output files from computing nodes to Login Server

Copyright (C) RIKEN, Japan. All rights reserved.


- 88 -

If each output file has the same name among computing nodes, it is possible to avoid overwriting by
adding rank number ( the first rank number in a node) to output file name.

Output file: /home/username/job/output


Rank: All ranks (Number of processes: n)

Login Server

computing node

computing node

rank: n-8

.....

..

..

rank: 0
rank: 7

rank: n-1

local area

local area

Dir:/home/username/job

Dir:/home/username/job

Dir:/home/username/job

File:output.0

File:output

File:output

shared
area
NFS

Dir:/home/username/job
File:output.n-8

Fig. 6-4 Transfer output files (in case of avoidance of overwriting)

Copyright (C) RIKEN, Japan. All rights reserved.


- 89 -

6.5 FTL Basic Directory


There is a simple way to transfer files. Just specifying a directory name where files to transfer are
located as FTL basic directory, the files are transferred as follows:

[before job runs]

Files (not include directories) in the FTL basic directory on Login Server
--> recognized as input files and transferred to computing nodes

[after job ends]

Files (not include directories) in the FTL basic directory on computing nodes
--> recognized as output files and transferred to Login Server

FTL basic directory: /home/username/job

Login Server

computing node

computing node
rank: 0

rank: n-8

..

..

.....

rank: 7

rank: n-1

local area

local area

Dir:/home/username/job

Dir:/home/username/job

Dir:/home/username/job

File:input.0

File:input.0

File:input.0

Dir:/home/username/job

Dir:/home/username/job

Dir:/home/username/job

File:input.n-1

File:input.n-1

File:input.n-1

shared area

Fig. 6-5 Transfer input files by specifying FTL basic directory

Copyright (C) RIKEN, Japan. All rights reserved.


- 90 -

Transfer of output files by specifying FTL basic directory works as follows.


1. "ReqID" directory is created in the FTL basic directory on Login Server.
2. Files in the FTL basic directory on computing nodes are transferred into "ReqID" directory on
Login Server.
3. The first rank number in the computing node is added to the output file.(Note 1) (Note 2)
Note 1: It is possible to transfer output files with no rank number. However, the output files are
overwrited if the files are the same name.
Note 2: You can specify which type of files to transfer; newly created files while the job is running,
or updated files while the job is running. If neither of the types is specified, only newly created files
are transferred.

Login Server

computing node
rank: 0

computing node
rank: n-8

..

..

.....

rank: 7

rank: n-1

local area

local area

Dir:/home/username/job/ReqID

Dir:/home/username/job

Dir:/home/username/job

File:output.0.0

File:output.0

File:output.0

shared area

Dir:/home/username/job/ReqID
File:output.0.n-8

Dir:/home/username/job

Dir:/home/username/job

File:output.n-1

File:output.n-1

Dir:/home/username/job/ReqID
File:output.n-1.0

Dir:/home/username/job/ReqID

File:output.n-1.n-8

Fig. 6-6 Transfer output files by specifying FTL basic directory

Copyright (C) RIKEN, Japan. All rights reserved.


- 91 -

6.6 FTL Syntax


You can select "single line mode" or "multi-line mode" for FTL. Basically, you indicate only one FTL
command in "single line mode" and two or more commands in "multi-line mode". FTL sentense begins
with # character. It is necessary to put # in the first column. If # is not put in the first column, it is not
recognize as FTL sentence.

Syntax of each mode is as follows.

Single line mode

#FTL command: [RANK-LIST[@directory]:] files[, files... ]

Multi-line mode

#<FTL command>
# [RANK-LIST[@directory]:] files[, files... ]
# [RANK-LIST[@directory]:] files[, files... ]
#</FTL command>

Items in FTL command are describes as follows.

files

Specify input / output file name to transfer

Files need to be general files and symbolic links

Files do NOT need to be directory, device, socket or FIFO

Some meta characters are available to use.

Specify relative path from a directory where a batch job is submitted.

Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home
or /data. On FTL variable, please refer to 6.6.12.5 FTL variable.

RANK-LIST

Specify destination of input files and destination of output files by RANK-LIST. For more
information on RANK-LIST, please refer to 6.6.12.6 RANK-LIST

directory

Specify destination directory.

Meta characters are not available.

Specify relative path from a directory where a batch job is submitted.

Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home
or /data. On FTL variable, please refer to 6.6.12.5 FTL variable.
Copyright (C) RIKEN, Japan. All rights reserved.
- 92 -

Restrictions on FTL are followings.

Files which are not contained in /home or /data cannot be specified in FTL command.

Blank character cannot be included in file name and directory name.

Batch job's standard / error output files (extension: .jms) and swap files (extension: .swp) are not
transferred.

Multi-line mode cannot be nested.

Use the ftlchk command to check FTL syntax and existence of file. For more information, please
see ftlchk --man.

Example:
[username@ricc1 ~]$ ftlchk go.sh
=====================
FTL Analysis Result
=====================
Line Type
11 BEFORE

TargetRank Stat
0-15

SourcePath[Login]
$CWD/a.out

DestinationDir[Calc]
--> $CWD

Copyright (C) RIKEN, Japan. All rights reserved.


- 93 -

6.6.1 FTL Syntax (transfer input file)


Transfer input files from Login Server to computing nodes by following syntax.

More than one command can be specified in a script file.

Use a comma as separator to specify multiple files to transfer.

Destination of files is determined by set of RANK-LIST and computing node's directory.

RANK-LIST and directory are optional. If "RANK-LIST and Computing node's directory" is not
specified, specified input files are transferred to computing nodes in the same directory
configuration of Login Server.

6.6.1.1 Single line mode (#BEFORE)

#BEFORE: [RANK-LIST[@computing node's directory]:] input file [...]

6.6.1.2 Multi-line mode (#<BEFORE> - #</BEFORE>)

#<BEFORE>
#[RANK-LIST[@computing node's directory]:] input file [...]
#</BEFORE>

Copyright (C) RIKEN, Japan. All rights reserved.


- 94 -

6.6.2 FTL Syntax (transfer input directory)


Transfer input directories from Login Server to computing nodes by following syntax.

More than one command can be specified in a script file.

Use a comma as separator to specify multiple directories to transfer.

Destination of input directories is determined by set of RANK-LIST and computing node's


directory.

RANK-LIST and computing node's directory are optional. If "RANK-LIST and Computing node's
directory" is not specified, specified input directories are transferred to computing nodes in the
same directory configuration of Login Server.

6.6.2.1 Single line mode (#BEFORE_R)

#BEFORE_R: [RANK-LIST[@computing node's directory]:] input directory [... ]

6.6.2.2 Multi-line mode(#<BEFORE_R> - #</BEFORE_R>)

#<BEFORE_R>
#[RANK-LIST[@computing node's directory]:] input directory [...]
#</BEFORE_R>

Copyright (C) RIKEN, Japan. All rights reserved.


- 95 -

6.6.3 FTL Syntax (transfer output directory)


Transfer output files from computing nodes to Login Server by following syntax.

More than one command can be specified in a script file.

Use a comma as separator to specify multiple files to transfer.

Destination of output files is determined by set of RANK-LIST and Login Server's directory.

RANK-LIST and Login Server's directory are optional. If "RANK-LIST and Login Server's
directory" is not specified, specified output files are transferred to Login Server in the same
directory configuration of computing nodes.

6.6.3.1 Single line mode (#AFTER)

#AFTER: [RANK-LIST[@Login Server's directory]:] output file [...]

6.6.3.2 Multi-line mode (#<AFTER> - #</AFTER>)

#<AFTER>
#[RANK-LIST[@Login Server's directory]:] output file [...]
#</AFTER>

Copyright (C) RIKEN, Japan. All rights reserved.


- 96 -

6.6.4 FTL Syntax (avoid overwrite output file)


Add rank number (first rank number in a node) to output files transferred by the AFTER command using
following syntax. This avoid overwriting output files when output files on computing nodes are the same
name.

This command can be specified only once in a script file.

If this command is not specified, "off" is set for the flag.

This is valid for files (collected from 2 or more computing nodes) specified by the AFTER
command.

There is no multi-line mode.

6.6.4.1 Single line mode (#FTL_SUFFIX)

#FTL_SUFFIX: flag

Item

Value
on

Meaning
Add rank number to output files

flag
off

Not add rank number to output files


Table 6-1 FTL_SUFFIX flag

Copyright (C) RIKEN, Japan. All rights reserved.


- 97 -

6.6.5 FTL Syntax (FTL basic directory)


Specify FTL basic directory by following syntax.

This command can be specified only once in a script file.

Only one FTL basic directory can be specified.

There is no multi-line mode for this command

Specify relative path from a directory where a batch job is submitted.

Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home or
/data. On FTL variable, please refer to 6.6.12.5 FTL variable.

6.6.5.1 Single line mode (#FTLDIR)

#FTLDIR: FTL basic directory

(note) When FTLDIR is used, unnecessary files may be transferred. Also, existence of files is checked
after job execution even though there is no file to be transferred. For large scale parallel jobs, as the
costs may be high, please use BEFORE and AFTER instead of FTLDIR.

Copyright (C) RIKEN, Japan. All rights reserved.


- 98 -

6.6.6 FTL Syntax (File collect type of FTL basic directory)


Specify file collect type for output files to transfer from FTL basic directory on computing nodes after the
job finishes.

This command can be specified only once in a script file.

If this command is not specified, "new" is set for File collect type.

There is no multi-line mode for this command

6.6.6.1 Single line mode (#FTL_COLLECT_TYPE)

#FTL_COLLECT_TYPE: file collect type

Item

Value

Meaning

new

Collect files which are not transferred at the start of job

file collect type


mtime

Collect updated files only while the job is running

Table 6-2 FTL_COLLECT_TYPE

Copyright (C) RIKEN, Japan. All rights reserved.


- 99 -

6.6.7 FTL Syntax (Avoid adding rank number of FTL basic directory)
Avoid adding rank number (first rank number in a node) to output files transferred by the FTLDIR
command using following syntax. Output files will be overwrote when output files on computing nodes
are the same name.

This command can be specified only once in a script file.

If this command is not specified, "off" is set for the flag.

This is valid for files specified by the FTLDIR command.

There is no multi-line mode.

6.6.7.1 Single line mode (#FTL_NO_RANK)

#FTL_NO_RANK: flag

Item

Value
on

Meaning
Not Add rank number to output files

flag
off

Add rank number to output files


Table 6-3 FTL_NO_RANK

Copyright (C) RIKEN, Japan. All rights reserved.


- 100 -

6.6.8 FTL Syntax (Rank Format)


Specify a number of digits of rank number by following syntax.

Specify a number of digits to add RANK-LIST to files with specified number of digits. This is valid
for FTL variable (*), FTLDIR command and AFTER command (when FTL_SUFFIX is set to
on).

This command is valid for FTL commands (FTL variable, etc.) which have been specified before
this command.

Specify a number from 0 to 9 as a number of digits.

There is no multi-line mode for this command.

(*1): On FTL variable, please refer to 6.6.12.5 FTL variable.

6.6.8.1 Single line mode (#FTL_RANK_FORMAT)

#FTL_RANK_FORMAT: number of digits

Item

Value

Meaning

number of

0-9

digits

0: no use of RANK FORMAT

a number of digits of RANK-LIST

Table 6-4 FTL_RANK_FORMAT

RANK-LIST

Number of digits:

Number of digits:

Number of digits:

Number of digits:

none specified

01

001

10

10

10

10

010

100

100

100

100

100

Table 6-5 RANK_FORMAT example

Copyright (C) RIKEN, Japan. All rights reserved.


- 101 -

6.6.9 FTL Syntax (make directory)


Make directories on all computing nodes running a job before the job starts by following syntax.

This command can be specified only once in a script file.

Use a comma as separator to specify multiple directories to make.

There is no multi-line mode for this command.

Specify relative path from a directory where a batch job is submitted.

Use FTL variables $MJS_HOME or $MJS_DATA when specifying absolute path from /home or
/data. On FTL variable, please refer to 6.6.12.5 FTL variable.

6.6.9.1 Single line mode (#FTL_MAKE_DIR)

#FTL_MAKE_DIR: directory [...]

Copyright (C) RIKEN, Japan. All rights reserved.


- 102 -

6.6.10 FTL Syntax (statistic information output)


Output statistic information (file transfer time, number of files, file size) of files transferred before and
after the job execution to standard output at the end of job by following syntax.

This command can be specified only once in a script file.

If this command is not specified, "off" is set for the flag.

There is no multi-line mode.

6.6.10.1 Single line mode (#FTL_STAT)

#FTL_STAT: flag

Item

Value

Meaning

off

None of statistic information output

normal

Normal mode.
Output statistic information to standard output.

flag

Detail mode.
detail

In addition to Normal mode, output statistic information


of files transferred to each rank.

Table 6-6 FTL_STAT flag

Copyright (C) RIKEN, Japan. All rights reserved.


- 103 -

6.6.10.2 Output format of statistic information

Output items
Item

Meaning

ELAPSE(s)

Elapsed time of file transfer (unit: second)

FILE_NUM

Total number of transferred files

FILE_SIZE(KB)

Total size of transferred files (unit: KB)


Table 6-7 Output item of statistic information

Normal mode format


#===========

FTL STATISTICS INFORMATION

---------------------ELAPSE(s)

BEFORE
FILE_NUM

=============#

---------------------------------FILE_SIZE(KB)

-----------------------------------------------------------------------TOTAL

60

---------------------ELAPSE(s)

30

AFTER
FILE_NUM

16384

---------------------------------FILE_SIZE(KB)

-----------------------------------------------------------------------TOTAL

10

30

16384

#=========================================================#

Detail mode format


#===========

FTL STATISTICS INFORMATION

---------------------ELAPSE(s)

BEFORE
FILE_NUM

=============#

---------------------------------FILE_SIZE(KB)

-----------------------------------------------------------------------TOTAL

60

10

100

RANK: 0-7

60

10

100

---------------------ELAPSE(s)

AFTER
FILE_NUM

---------------------------------FILE_SIZE(KB)

-----------------------------------------------------------------------TOTAL

60

10

10

RANK: 0-7

60

RANK: 8-15

60

#=========================================================#

Copyright (C) RIKEN, Japan. All rights reserved.


- 104 -

6.6.11 FTL Syntax (output transferred file information)


Output information of files transferred before the job execution and files created while the job is running
to standard output at the end of job by following syntax.

This command can be specified only once in a script file.

If this command is not specified, "off" is set for the flag.

There is no multi-line mode.

Directory with no files is not displayed.

6.6.11.1 Single line mode (#FTL_INFO)

#FTL_INFO: flag

Item

Value

Meaning

off

None of file information output

before

Output information of files transferred before the job


starts.

flag

after

Output information of files (including transferred files


before the job starts) created while the job is running.
Output information of files transferred before the job

all

starts and files (including transferred files before the


job starts) created while the job is running.
Table 6-8 FTL_INFO flag

Copyright (C) RIKEN, Japan. All rights reserved.


- 105 -

6.6.11.2 Output format of transferred file information

Output items
Item

Meaning

TIME

Access time of file (Month Date HH:MM)

SIZE(KB)

File size (unit: KB)

FILE_NAME

File name

Table 6-9 Output items of transferred file information

Output format
Output file information of each rank. Output format with flag "all" is following. With flag "before"

output is only part of (*1), with flag "after" output is only part of (*2).
#===============

FTL

-------------------

FILE INFORMATION
BEFORE

===============#

---------------------

[RANK: 0-7]
TIME

SIZE(KB)

FILE_NAME

-------------------------------------------------------Jul 16 10:41

14246

Jul 24 10:20

361

/home/username/job/a.out
/home/username/job/input.1
(*1)

[RANK: 8-16]
TIME

SIZE(KB)

FILE_NAME

-------------------------------------------------------Jul 16 10:41

14246

Jul 24 10:20

361

-------------------

/home/username/job/a.out
/home/username/job/input.2
AFTER

----------------------

[RANK: 0-7]
TIME

SIZE(KB)

FILE_NAME

--------------------------------------------------------Jul 16 10:41

14246

/home/username/job/a.out

Jul 24 10:20

361

/home/username/job/input.1

Jul 24 10:25

361

/home/username/job/output

(*2)

[RANK: 8-16]
TIME

SIZE(KB)

FILE_NAME

--------------------------------------------------------Jul 16 10:41

14246

Jul 24 10:20

361

/home/username/job/a.out
/home/username/job/input.2

#=======================================================#
Copyright (C) RIKEN, Japan. All rights reserved.
- 106 -

6.6.12 FTL Syntax (others)


6.6.12.1 Comment
Characters after an exclamation mark ! are regards as a comment.

Example

#<BEFORE>
#! this line is comment

<-- This line is commnet

# a.out

<-- a.out is transferred, but b.out is not.

! b.out

#</BEFORE>

6.6.12.2 Blank line


Blank lines and lines of only # are ignored.

Example

#<BEFORE>
#

<-- This line is ignored.


<-- This line is igonored.

# a.out
#</BEFORE>

6.6.12.3 Special character


Comma (,), colon (:), equal (=) and exclamation mark (!) are special characters in FTL commands.
Put a backslash before a special character when the special character is included in file name or
directory name.

#<BEFORE>
# a:b.out

<-- transfer a:b.out

#</BEFORE>

Copyright (C) RIKEN, Japan. All rights reserved.


- 107 -

6.6.12.4 Meta character


Following meta characters are available in file name and directory name. However, they are not
available in directory portion of file name.

Meta character

Meaning

Match any (zero or more) characters

Match any single character


Table 6-10 Meta character list

Example

#<BEFORE>
# input.?

input.0, input.1, input3 are transferred.

# a*

a.out, a.1, a.2 are transferred

# bin/exe*/a.out

Meta characters are not available for directory portion

#</BEFORE>

6.6.12.5 FTL variable


Following FTL variables are available in file name and directory name. However, they are not available
in directory portion of file name and the FTL_MAKE_DIR command.
Using FTL variables, input / output file transfer commands for MPI jobs can be specified easily.

Variable

Meaning

$MJS_HOME

Home directory path (/home/username)

$MJS_DATA

Data directory path (/data/username)

$MJS_CWD
$MJS_REQID
$MJS_REQNAME

directory path where a job is submitted


REQUEST-ID.
This is available in file name in the AFTER command and directory name.
REQUEST-NAME
This is available in file name and directory name.

$MJS_BULKINDEX

Bulk Index ID
This is available in file name and directory name.

$MPI_RANK

MPI rank (from 0 to number of processes -1)

$XPF_RANK

XPF processer identification number (from 1 to number of processes)


Table 6-11 Environment variable list
Copyright (C) RIKEN, Japan. All rights reserved.
- 108 -

Example 1
#<AFTER>
# 0@$MJS_CWD: log/output

log/output file on MPI master node is transferred to


the directory where the job is submitted

#</AFTER>

Example 2

#BEFORE: input.$MPI_RANK

With above BEFORE command, if an MPI

program of 16 processes is executed, input files are

transferred to 16 processes. Files (/home/username/input.0 input.7) are transferred to the


first computing node and files (/home/username/input.8 input.15) are transferred to the
second computing node as indicated in Fig. 6-7 Example of input file transfer with FTL variable.

Login Server

rank: 0

rank: 8

rank: 7

rank: 15

shared area

local area

local area

Dir:/home/username

Dir:/home/username

File:input.0

File:input.0

..

computing node

..

computing node

...

...
Dir:/home/username

Dir:/home/username

File:input.7

File:input.7

Dir:/home/username

Dir:/home/username

File:input.8

File:input.8

...

...
Dir:/home/username

Dir:/home/username

File:input.15

File:input.15

Fig. 6-7 Example of input file transfer with FTL variable

Copyright (C) RIKEN, Japan. All rights reserved.


- 109 -

6.6.12.6 RANK-LIST
Specify destination of input file and source of output file by following descriptions.

If ranks are specified redundantly, they are processed as specified once.

If nonexistent ranks are specified, no file is transferred to the ranks.

If existent ranks and nonexistent ranks are specified at the same time, files are transferred to the
existent ranks but not to the nonexistent ranks.

Item

Format

Meaning

II

13

File transfer command to rank1, rank2, rank3 of computing nodes.

III

1,3

File transfer command to rank1 and rank3 of computing nodes.

File transfer command to rank1 of computing node.

1-3,5,7
IV

(combination of
Item II, III in this table)

File transfer command to ranks rank1, rank2, rank3, rank5 and


rank 7 of computing nodes.

File transfer command to all computing nodes assigned for a job.

VI

ALL

File transfer command to all computing nodes assigned for a job.

VII

MASTER

File transfer command to a master node (rank 0) of a job.


Table 6-12 RANK-LIST format

Ranges of RANK-LIST for each job type are following.

Job type

Range of RANK-LIST

Serial job
MPI parallel job
OpenMP / auto parallel job
Hybrid job

0
0 (number of processes -1)

0
0 (number of processes -1)

Table 6-13 Range of RANK-LIST for Job type

Copyright (C) RIKEN, Japan. All rights reserved.


- 110 -

6.7 FTL generating tool : ftlgen


ftlgen generates FTL command and job submit option line, interactively.

ftlgen

<option>

Option

Meaning

-chk

execute ftlchk command after create FTL command line

-o <filename>

Output shell script to file

[example] execute ftlgen command (ftlgen command is available on tab completion)


[username@ricc1:~] ftlgen
Specified project-ID

MJS: Project id: G00001

MJS: Number of process(range: 1-8192, default: 1): 256 Specified procces


MJS: Number of thread(range: 1-8, default: 1): 1 Specified thread
MJS: Merge stderr to stdout ?('y' or 'n', default: 'y'): y
Marge standard output/error output
MJS: Run on current working directory ?('y' or 'n', default: 'y'): y
Specified Job execution directory
MJS: Other qsub options: -time 1:00:00 mem 1.2GB
Specified other qsub option
MJS: Executable module and command path: a.out
Specified execution module
FTL(PRE): Are there any input file ?('y' or 'n'): y If input file exists
FTL(PRE): Input file or directory: input

Specified input file

FTL(PRE): Destination rank number(0-255): *

Specified rank number

FTL(PRE): Transfered Directory: [ENTER](skip)

Specified destination

directory(If optional, specified Job execution directory)


If other input file exists

FTL(PRE): Enter more ?('y' or 'n'): n

FTL(POST): Are there any output file ?('y' or 'n'): y If output file exists
FTL(POST): Output file: output.log

Specified transfer output file

FTL(POST): Source rank number(0-255): *

Specified rank number

FTL(POST): Transfered Directory: outputdir

Specified destination

directory
FTL(POST): Enter more ?('y' or 'n'): n

If other input file exists

#!/bin/sh

output result

#--- qsub options ---#


#MJS: -project V10002
Copyright (C) RIKEN, Japan. All rights reserved.
- 111 -

#MJS: -proc 256


#MJS: -thread 1
#MJS: -eo
#MJS: -cwd
#MJS: -time 1:00:00 mem 1.2GB
#MJS: -compiler fj
#MJS: -parallel fjmpi
#--- FTL file information ---#
#BEFORE:*: $MJS_CWD/input
#AFTER:*@$MJS_CWD/outputdir: $MJS_CWD/output.log
#BEFORE:*: $MJS_CWD/a.out
#--- Job execution ---#
mpirun a.out

Copyright (C) RIKEN, Japan. All rights reserved.


- 112 -

7. Development Environment
7.1 Endian conversion
7.1.1 Outline of endian
Endian is a method of how to store a number which consists of multiple bytes into memory. For
example, when number 1234 is stored, a method storing 12 into 1 st byte and 34 into 2nd byte is called
Big Endian. On the other hand, a method storing 34 into 1st byte and 12 into 2nd byte is called Little
Endian.

7.1.2 Endian type of RSCC and RICC


RICC consists of little endian computers. However, RSCC (RIKEN Super Combined Cluster) consists
of big endian computers and little endian computers and big endian was used for unformatted
WRITE / READ statement. Therefore, please pay attention when reading Fortran's unformatted
output files (big endian) created on RSCC.
System

Endian

RSCC

Big endian

RICC

Little endian
Table 7-1 Endian type of RSCC and RICC

7.1.3 Endian type


7.1.3.1 Fujitsu compiler
Specify runtime option WI, -T to read or write big endian data by Fujitsu compiler. With T option,
logical type data, integer type data and IEEE floating-point data are converted from big endian to little
endian in unformatted I/O statements.
example 1)

Convert unit number 10 to little endian.

[username@ricc1:~] srun ./serial.out Wl,-T10


example 2)

Convert all unit numbers to little endian.

[username@ricc1:~] srun ./serial.out Wl,-T

Copyright (C) RIKEN, Japan. All rights reserved.


- 113 -

7.1.3.2 Intel compiler


Specify environment variable F_UFMTENDIAN to read or write big endian data by Intel compiler.
example 1)

Convert unit number 10 to little endian.

[username@ricc1:~] export F_UFMTENDIAN=10


[username@ricc1:~] srun ./serial.out
example 2)

Convert all unit numbers to little endian.

[username@ricc1:~] export F_UFMTENDIAN=big


[username@ricc1:~] srun ./serial.out

7.2 Debugger
The debugger enables the user to run a program under control of the debugger to verify processing
logic.
The following types of operations can be performed for a serial program and an MPI program of Fortran
and C/C++, and an XPFortran prgram.
The profiler can output following information.

Application execution control

Setting of program execution stop position

Expression and variable evaluation and display

Use of calling stack

7.2.1 The preparation to use debugger


The following two compilation options must be specified when you compile and link programs to debug.

-g
Produce debugging information. If this option is omitted, you cannot diplay the value of variable
and so on.

-Ktl_trt
Link the tool runtime library. This option enables to use debug, profiling and MPI trace functions
at execution of a program. This option is effective by default.

Copyright (C) RIKEN, Japan. All rights reserved.


- 114 -

7.2.2 Start debugger


Use the fdb command (CUI) to launch debugger. For more information on the fdb command, please
refer to man command of fdb. For information on the xfdb(GUI), please refer to Debugger Users
Guide.
[username@ricc1:~] f77 pc g Ktl_trt sample.f
[username@ricc1:~] srun fdb a.out
FDB [Fujitsu Debugger for C/C++ and Fortran] Version 7.0MT/OMP
Please wait to analyze the DEBUG information.
fdb*

Start debugger

fdb* list
5

double INTEGER i

read(*,*) i

print *,' ** fortran77 output=',i

go to (10,20,30) , i

print *,' i=0',i

10

go to 90

12

10

13

print *,' i=',i


go to 90

14
fdb* break 10
#1

Insert break point

0x100000ad0 (MAIN__ + 0x118) at line 10 in /home/username/sample.f

fdb* show break


Num

Address

#1

Specify Stop? Where

0x0000000100000ad0 Enable

Yes

(MAIN__ + 0x118) at line 10 in

/home/username/sample.f
fdb* p i

insert print cmd (variable i)

Result = 123
fdb* c
Continue program: a.out
The program: a.out terminated.

Copyright (C) RIKEN, Japan. All rights reserved.


- 115 -

8. Tuning
8.1 Tuning overview
Modifying program to finish execution faster is called tuning. A series of work of collecting tuning
information, performance evaluation/analysis, modifying source code and performance measurement
etc. is done for the tuning of the program.
At first, find the part where a lot of execution time is spent in the program. Generally, a big tuning effect
is achieved by speeding up the part.
There are following methods to get execuction time information.

Call the subroutines which get time information in programs

Use the option of a batch job which collects statistics information

Use the profiler

Collecting tuning information

Performance evaluation/analysis

iteration
Tuning
change compile option,
modify source code
Fig 8-1 Tuinig overview

8.2 Time measurement


8.2.1 Fortran program
The CPU_TIME sub routine returns CPU processing time by the second.
example 1)

Invoke the CPU_TIME sub routine

Copyright (C) RIKEN, Japan. All rights reserved.


- 116 -

real(kind=8) start_time, stop_time


...
call cpu_time(start_time)
...portion to be measured
call cpu_time(stop_time)
write(6,*) "time = ", stop_time start_time

8.2.2 C program
The clock function returns approximate value of processing time.
Invoke the clock function

example 1)

#include <time.h>
clock_t start_time, stop_time;
start_time = clock();
...portion to be measured
stop_time = clock();
printf("time

%10.30fn",

(double)(stop_time

start_time)

CLOCKS_PER_SEC;

8.2.3 MPI program


Use the MPI_Wtime function to measure elapsed time. Invoke the MPI_Wtime function before and
after portion to be measured. Time difference between them is the elapsed time.
Invoke the MPI_Wtime function (Fortran)

example 1)

real(kind=8) start_time, stop_time


...
call mpi_barrier(mpi_comm_world, ierr)
start_time = mpi_wtime()
....portion to be measured
call mpi_barrier(mpi_comm_world, ierr)
stop_time = mpi_wtime()
if (myrank .eq. 0) then
write(6,*) time = , stop_time start_time
end if

example 2)

Invoke the MPI_Wtime function (C)

Copyright (C) RIKEN, Japan. All rights reserved.


- 117 -

double start_time, stop_time;


...
MPI_Barrier(MPI_COMM_WORLD, ierr);
start_time = MPI_Wtime();
....portion to be measured
MPI_Barrier(MPI_COMM_WORLD, ierr);
stop_time = MPI_Wtime();
if (myrank == 0) {
printf(time = %lf, stop_time start_time);
}

8.2.4 System resource statistics information of batch job


By specify -oi or -OI option when submitting a jobs, the summary information and resource
information per each computing node is written to standard output file.
[username@ricc1:~]

cat go.sh

#!/bin/sh
#MJS: -proc 8
#MJS: -cwd
#MJS: eo
#MJS: -oi
#MJS: -time 1:00:00
#BEFORE: a.out
mpirun ./a.out

Copyright (C) RIKEN, Japan. All rights reserved.


- 118 -

example 1)

Used resource information (the standard output of batch request)

[username@ricc1:~]
()

cat go.sh.o2733417.jms

Allocated Resource
<- allocate resource of entire job
Virtual Nodes
:
8 Node
Before Free Memory
Total Large Page Memory
:
0 Mbyte
Total Normal Page Memory
: 10737418240 Byte
After Free Memory
Total Large Page Memory
:
0 Mbyte
Total Normal Page Memory
: 10737418240 Byte
CPUs
:
8 CPU
Inter-Node Barrier
:
0 Unit
Execmode
:
CHIP_SHare
Elapse time limit
:
3600.000 sec
Used Resource
<- used resource of entire job
Total System CPU Time
:
463 msec
Total User CPU Time
:
470516 msec
Total Large Page Memory
:
0 Mbyte
Total Normal Page Memory
: 342716416 Byte
CPUs
:
8 CPU
Inter-Node Barrier
:
0 Unit
--------------------------------------Virtual Node Information : NODE : mpc0448 <- computing node
Archi Information
:
PG
Allocated Resource
<- allocate resource per process
Before Free Memory
Large Page Memory
:
0 Mbyte
Normal Page Memory
: 1342177280 Byte
After Free Memory
Large Page Memory
:
0 Mbyte
Normal Page Memory
: 1342177280 Byte
Free memory time
:
0 msec
CPUs
:
1 CPU
CPU time limit
:
UNLIMITED
Used Resource
<- used resource per process
Large Page Memory
:
0 Mbyte
Normal Page Memory
: 50913280 Byte
CPUs
:
1 CPU
CPU Time
System time
Max CPU Time
:
236 msec
Total CPU Time
:
236 msec
SBID
ChipID
CPUID
System time
0
0
0
:
236 msec

User time
59426 msec
59426 msec
User time
59426 msec

Copyright (C) RIKEN, Japan. All rights reserved.


- 119 -

8.3 Program development support tool


8.3.1 Fujitsu compiler
8.3.1.1 Profiling function
The profiler is available for profiling function. The profiler is a tool for collecting information on
application performance. To improve application performance, it is a usual and effective method to find
the location where much execution time is consumed and speed it up.
The profiler can output following information.

Time statistic information

Interprocess communication information

Elapsed time, breakdown of user CPU time / system CPU time , etc.

Time of interprocess communication and waiting to synchronize by MPI and XPFortran

MPI library elapsed time information

Elapsed time to execute MPI library

8.3.1.2 Collect profiling data


Use the srun/mpirun/xpfrun command with option prof or profopt to collect profiling data.
With these options, the srun/mpirun/xpfrun commands invoke the fpcoll command internally.
For more information on the fpcoll command, please refer to Profiler User's Guide. Use profopt
option to specify argument to the fpcoll command.
In Interactive jobs, profiling data collection and profiler information output can be performed at the same
time. In batch jobs, since Massively Parallel Cluster does not have shared area, profiling data collection
and profiler information output cannot be always performed at the same time. In that case, it is
necessary to perform profiling data collection and profiler information output separately.

Copyright (C) RIKEN, Japan. All rights reserved.


- 120 -

example 1)

Execute serial job by interactive job (-prof option)

[username@ricc1 ~]$ srun -prof ./stream


(execution result is skipped)
Fujitsu Performance Profiler Version 3.1
Measured time
: Thu Jul 30 01:11:06 2009
CPU frequency
: Process
0 2933 (MHz)
Type of program
: SERIAL
Average at sampling interval : 11.0 (ms)
Measured range
: All ranges
-------------------------------------------------------------______________________________________________________________
Time statistics
Elapsed(s)
User(s)
System(s)
--------------------------------------------28.4038
28.2500
0.1100
Application
--------------------------------------------28.4038
28.2500
0.1100
Process
0
_________________________________________________________________
Procedures profile
**************************************************************
Application - procedures
**************************************************************
Cost
%
Start
End
-------------------------------------------2569
100.0000
--Application
-------------------------------------------2559
99.6107
127
285
main
10
0.3893
337
399
checkSTREAMresults
____________________________________________________________________
Lines profile
*****************************************************************
Application - lines
*****************************************************************
Cost
%
Line
----------------------------------2569
100.0000
-Application
----------------------------------629
24.4842
251
main

Copyright (C) RIKEN, Japan. All rights reserved.


- 121 -

example 2)

Execute MPI parallel job by interactive job (-prof option)

Fujitsu Performance Profiler Version 3.1


Measured time
: Thu Jul 30 01:15:59 2009
CPU frequency
: Process
0 2933 (MHz)
Type of program
: MPI
Average at sampling interval : 11.0 (ms)
Measured range
: All ranges
------------------------------------------------------------_____________________________________________________________
Time statistics
Elapsed(s)
User(s)
System(s)
--------------------------------------------1.0764
0.9159
0.0800
Application
--------------------------------------------1.0764
0.9159
0.0800
Process
0
_________________________________________________________________
Communication profile
Elapsed(s) Communication(s)
%
-------------------------------------------1.0764
0.7343
68.2220
Application
-------------------------------------------1.0764
0.7343
68.2220
Process
0
Send + Put
+--------------------------------------------------+
|
##########################|
52 %
+--------------------------------------------------+
Percentage of time waiting for a send and put

Process

Received + Get
+--------------------------------------------------+
|
########|
16 % Process
0
+--------------------------------------------------+
Percentage of time waiting for a received and get
_________________________________________________________________
Procedures profile
**************************************************************
Application - procedures
**************************************************************
Cost
%
Start
End
-------------------------------------------83
100.0000
--Application
-------------------------------------------51
61.4458
--__GI_memcpy
9
10.8434
369
397
IMB_ass_buf
7
8.4337
--memcpy_nts_asm64a
3
3.6145
--_LowLevel_MutexUnlock
2
2.4096
--_LowLevel_Exchange4
1
1.2048
--intra_Reduce
1
1.2048
--mpigfc_
1
1.2048
--PMPI_Sendrecv
1
1.2048
--_GMP_StopSendTimer
1
1.2048
--Copyright_GMP_Send
(C) RIKEN, Japan. All rights reserved.
_________________________________________________________________
- 122 Loops profile
**************************************************************
Application - loops

example 3)

MPI parallel job by batch job

1. Specify the fpcoll command's option as the profopt option to collect profiling date. Items of
profiling date can be specified by the -I option. Profiling date is created in a directory specified by
the d option.
When executing an application on Massively Parallel Cluster, transfer profiling data to Login Server
by FTL.
$ cat go.sh
#!/bin/sh
#------- qsub option -------#
#MJS: -pc
#MJS: -proc 64
#MJS: -eo
#MJS: -time 10:00
#MJS: -cwd
#------- FTL command -------#
#BEFORE: a.out
#AFTER:

ALL@${MJS_REQID}_prof:profile-data/*

#------- Program Execution -------#


mpirun -profopt "-C -Icpu,mpi -d profile-data" ./a.out

2. Use the fprof command to display profiler information. Items of profiling date to display can be
specified by the -I option. Specify the directory of profiling data by the -d option.
$ fprof -Impi -d 1417379.jms_prof

Copyright (C) RIKEN, Japan. All rights reserved.


- 123 -

-------------------------------------------------------------------Fujitsu Performance Profiler Version 3.1


Measured time
: Wed Sep 2 15:50:18 2009
CPU frequency
: Process
0 63 2933 (MHz)
Type of program
: MPI
Average at sampling interval : 11.0 (ms)
Measured range
: All ranges
--------------------------------------------------------------------_____________________________________________________________________
Time statistics
Elapsed(s)
User(s)
System(s)
--------------------------------------------59.3825
3701.1963
16.8100
Application
--------------------------------------------59.3825
58.6720
0.1700
Process
14
59.3759
55.9310
0.3200
Process
25
59.3754
57.2190
0.4700
Process
36
59.3744
58.6740
0.1500
Process
50
59.3743
58.6620
0.1700
Process
13
59.3710
56.6740
0.2800
Process
42
59.3691
55.9420
0.4100
Process
24
59.3690
57.7630
0.2100
Process
34
59.3689
57.6010
0.2200
Process
18
59.3680
57.7970
0.3200
Process
8
59.3665
58.6170
0.2100
Process
48
59.3649
55.7580
0.4000
Process
27
59.3621
57.5100
0.2800
Process
32
59.3618
57.5390
0.2900
Process
16
59.3611
58.5850
0.2500
Process
12
59.3609
58.5590
0.1700
Process
47
_____________________________________________________________________
MPI libraries profile - based on the user procedure.
*********************************************************************
Application - MPI libraries
*********************************************************************
Elapsed(s)
%
Call to
--------------------------------------59.3825
---.--------------Application
--------------------------------------3.2200
5.4225
45312 jacobi_ (
199 250)
2.6808
4.5144
226560
sendp1_ (
577 629)
1.9381
3.2638
226560
sendp2_ (
521 573)
0.7023
1.1827
226560
sendp3_ (
465 517)
0.3752
0.6318
512 initcomm_ (
254 332)
0.0840
0.1415
576 MAIN__ (
38 142)
0.0000
0.0000
384 initmax_ (
336 440)

Copyright (C) RIKEN, Japan. All rights reserved.


- 124 -

8.4 Network topology


Network topology for Massively Parallel Cluster, Multi-purpose Parallel Cluster and MDGRAPE-3
Cluster is fat-tree topology, which consists of 60 leaf switches connecting each computing node and 2
spine switches connecting each leaf switch (refer to Fig. 8-2 Network topology outline diagram).
Each leaf switch has 24 ports. 20 of them are connected to computing nodes and 4 of them are
connected to spine switches. Therefore, when 20 computing nodes connected to the same leaf switch
are concurrently communicating computing nodes connected to the other switches, communication
data of the 20 computing nodes is needed to transfer using 4 InfiniBand cables, and network bandwidth
can be limited up to 1/5.
Spine switch

Spine switch

x 120

x2

x2

x2

Leaf switch

x2

x2

Leaf switch
x 20

Compute node x 20

x 120

x 20

Compute node x 20

x2

x2

Leaf switch

x2

Leaf switch
x 20

Compute node x 20

x 20

Compute node x 20
InfiniBand

Fig. 8-2 Network topology outline diagram

Job scheduler of RICC minimize the number of leaf switches connecting computing node, and allocates
the parallel job. However, allocated computing node might be distributed to more reef switches in a
situation where system usage ratio is high because computing nodes allocated to a job executed next
depend on the jobs which finished previously
This difference of allocateion computing nodes may not have an impact on normal job execution but it
may have an impact on job execution of high communication load such as network communication
benchmark test.

Copyright (C) RIKEN, Japan. All rights reserved.


- 125 -

9. How to use Archive system


To transfer files to Archive system over network, use the file transfer special commnads (pftp, hsi
and htar).
* pftp is an extended command of ordinary ftp. A usage method of pftp is the same as ftp.
* hsi is an extended command of ordinary pftp. It is possible to transfer the directory.
* htar is an extended command of ordinary tar. A usage method of htar is the same as tar.
* Size of a file is restricted to 1.22TB on pftp, hsi and htar. When transferring files to Archive system
by htar, all transferred files are archived as one htar format file. Therefore, total size of transferred
files must be less than 1.22TB.

9.1 Configuration
If you use the hsi and the htar for the first time in RICC or you use them after RICC password is
updated, use the arc_keytab command to gererate Keytab file for authentication.
You don't need to generate Keytab file from the next time.

Example:
[username@ricc:~] arc_keytab
Getting a KEYTAB file for user: username
Please wait ....
...............
A KEYTAB file was generated successfully.

As above example, if "successfully" is displayed, configuration completes.

Copyright (C) RIKEN, Japan. All rights reserved.


- 126 -

9.2 pftp
9.2.1 Get file
9.2.1.1 Login
[username@ricc:~] pftp arc
Using /opt/hpss/etc/HPSS.conf
Connected to arc.
220 hpcore FTP server (HPSS 7.1 PFTPD V1.1.1 Tue Jan 19 07:16:29
JST 2010) ready.
Parallel stripe width set to (1).
Name (arc:username):

Enter return key

331 Password required for username.


Password:*********

Enter RICC password

230 User username logged in as username@HPCORE.HPSS


Remote system type is UNIX.
Using binary mode to transfer files.
ftp>

Login completed

9.2.1.2 Transfer file


ftp>pget file_name

Enter file name

remote: file_name local: file_name


200 Command Complete (4104704, "file_name", 0, 1, 4194304, 0).
200 Command Complete.
150 Transfer starting.
226 Transfer Complete.(moved = 4104704).
4104704 bytes received in 0.1400 seconds (27.961 MBytes/sec)
200 Command Complete.

9.2.1.3 Confirm transferred file and Logout


ftp> !ls la
-rw-r--r--

Confirm transferred files


1 username

ftp> bye

groupname

4104704

Jul 28 20:40 file_name


Logout

221 Goodbye.
[username@ricc:~]

Copyright (C) RIKEN, Japan. All rights reserved.


- 127 -

9.3 hsi
9.3.1 Get file
9.3.1.1 Login
[username@ricc:~] hsi
Username: username

UID: UID Acct: UID(UID) Copies: 1 Firewall: off

[hsi.3.5.3 Wed Jan 20 07:32:04 JST 2010]


A:[RICC]/home/username->

Login completed

9.3.1.2 Get file


A:[RICC]/home/username-> get -R testdir
get

Enter file name

'/home/username/testdir/testfile1' : /home/username/testdir/testfile1'

(2009/07/28 20:57:57 1048576 bytes, 7050.6 KBS )


get '/home/username/testdir/testfile2' :'/home/username/testdir/testfile2'
(2009/07/28 20:57:58 1048576 bytes, 10074.9 KBS )
get '/home/username/testdir/testfile3' :'/home/username/testdir/testfile3'
(2009/07/28 20:57:58 1048576 bytes, 17090.1 KBS )

9.3.1.3 Confirm got file and Logout


A:[RICC]/home/username-> !ls -l testdir

Confirm got file

total 6144
-rw-------

1 username

groupname

1048576 Jul 28 21:03 testfile1

-rw-------

1 username

groupname

1048576 Jul 28 21:03 testfile2

-rw-------

1 username

groupname

1048576 Jul 28 21:03 testfile3

A:[RICC]/home/username-> quit

Logout

[username@ricc:~]

Copyright (C) RIKEN, Japan. All rights reserved.


- 128 -

9.4 htar
Followings are the restrictions of the htar command.

Size of a member file is up to 68,719,476,735(64G-1) byte.

Number of member files in a tar file is up to 1 milion.

Directory name is up to 154 characters, file name is up to 99 characters when path name of a
member file is divided into directory name / file name.
(Example) Path name

: /home/username/dir1/dir2/test.data

Directory name: /home/username/dir1/dir2


File name

: test.data

Link name of a symbolic link is up to 99 characters.

9.4.1 Confirm put file


Confirm contents of tar file with tf option.
[username@ricc:~] htar

-tf

test.tar

......
HTAR: -rw-r--r--

username/groupname

1252 2004-06-18 09:45

work/test1

HTAR: -rw-r--r--

username/groupname

3390 2004-03-04 11:56

work /test2

HTAR: -rw-r--r--

username/groupname

20932 2004-11-09 17:49

work /test3

HTAR: HTAR SUCCESSFUL


To confirm tar file name, login with the hsi command and then use the ls command.

9.4.2 Get file


A usage method is the same as the tar command. Extract files with xf option.
[username@ricc:~] htar

-xf

test.tar

HTAR: HTAR SUCCESSFUL

Files are extracted in the current directory.

Copyright (C) RIKEN, Japan. All rights reserved.


- 129 -

10. RICC Portal


10.1 RICC Portal

https://ricc.riken.jp

10.1.1 URL to access


On RICC Portal, users can operate files on Login Server, compile, link programs and submit jobs for all
computing server system using web interface.
Access the following URL to login RICC Portal

https://ricc.riken.jp

10.1.2 How to Login


In the following login window, enter RICC user account, RICC password and then click LOGIN button.
After authentication completed, RICC Portal is available.

2. Select Client Certification


Click [OK]

Copyright (C) RIKEN, Japan. All rights reserved.


- 130 -

1. Enter RICC user account

2. Enter RICC password

3. Click [LOGIN]

Fig. 10-1 RICC login window

On usage of RICC Portal, please click help icons of functions or refer to online manual on RICC Portal.

11. Manual
Access RICC Portal from Web browser. On accessing RICC Portal, please refer to 10 RICC Portal.
After login, click links in [Documentation] in the left of menu to refer online manuals.

Available manuals are listed at next section.

Copyright (C) RIKEN, Japan. All rights reserved.


- 131 -

1. Click [MAIN]

2. Click
[Documentation]

[Product Manual]
-> Product manuals
are available to refer

Fig. 11-1 RICC Portal online manual window

Copyright (C) RIKEN, Japan. All rights reserved.


- 132 -

11.1 Product manual


11.1.1 Common
RICC Portal User's Guide

11.1.2 Language
Fortran User's Guide
Fortran Language Reference
Fortran Compiler Messages
Fortran Runtime Messages
C User's Guide
C++ User's Guide
C++ Compiler Feature
XPFortran User's Guide
MPI User's Guide

11.1.3 Programming Tools


Debugger User's Guide
MPI Tracer User's Guide
Programming Workbench User's Guide
Profiler User's Guide

11.1.4 Scientific Subroutine Library II (SSL II)


List of Subroutines
How to use SSL II
How to link-edit SSL II
SSLII User's Guide
SSLII Extended Capabilities User's Guide
SSLII Extended Capabilities User's Guide II
How to compile (Thread-Parallel Capabilities)
Thread-Parallel Capabilities User's Guide
How to compile(C language)
SSLII User's Guide(C language)
How to use C-SSL II
How to compile Thread-Parallel Capabilities (C language)
Thread-Parallel Capabilities User's Guide(C language)
How to compile (MPI)
MPI User's Guide

11.1.5 BLAS LAPACK ScaLAPACK


User's Guide
Copyright (C) RIKEN, Japan. All rights reserved.
- 133 -

11.1.6 Intel Compiler


Fortran User's Guide
C User's Guide
Math Kernel Library(MKL) User's Guide

11.1.7 PGI Compiler


PGI User's Guide
PGI Tools Guide
PGI Fortran Reference

11.1.8 Message Passing Toolkit (MPT)


User's Guide

Copyright (C) RIKEN, Japan. All rights reserved.


- 134 -

Appendix

Copyright (C) RIKEN, Japan. All rights reserved.


- 135 -

1. FTL Examples
Job execution scripts using FTL are introduced in this appendix. FTL commands are in bold.
Following environment variables are used in this appendix.

Environment variable

Value

$MJS_CWD

/home/username/job

$MJS_DATA

/data/username

$MJS_REQID

REQUEST-ID of job

Copyright (C) RIKEN, Japan. All rights reserved.


- 136 -

1.1 Execute serial job


1.1.1 sample 1 (transfer output file to job execution directory)

Content of job execution

Execute execution module a.out. Transfer output file to job execution directory.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file / directory

(none)

Output file

output

Destination of output file

$MJS_CWD

Job execution script


#!/bin/sh
#MJS: -proc 1 -eo
#MJS: -cwd
#BEFORE: a.out
srun ./a.out
#AFTER: output

Copyright (C) RIKEN, Japan. All rights reserved.


- 137 -

Transfer input file

Host: Login Server

Host: node 0 (rank 0)

$MJS_CWD
a.out

$MJS_CWD

transfer

a.out

Transfer output file

Host: Login Server

Host: node 0 (rank 0)


$MJS_CWD

$MJS_CWD
output

transfer

output

Copyright (C) RIKEN, Japan. All rights reserved.


- 138 -

1.1.2 sample 2 (Transfer output file to /data)

Content of job execution

Execution module a.out. Transfer output file to /data/username/data.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

(none)

Output file

output

Destination of output file

$MJS_DATA/data

Job execution script


#!/bin/sh
#MJS: -proc 1 -eo
#MJS: -cwd
#BEFORE: a.out
srun ./bin/a.out
#AFTER: 0@${MJS_DATA}/data: output

Copyright (C) RIKEN, Japan. All rights reserved.


- 139 -

Transfer input file

Host: Login Server

Host: node 0 (rank 0)

$MJS_CWD

$MJS_CWD

a.out

a.out
transfer

Transfer output file

Host: Login Server

Host: node 0 (rank 0)


$MJS_CWD

$MJS_DATA/username/data
output

output
transfer

Copyright (C) RIKEN, Japan. All rights reserved.


- 140 -

1.1.3 sample 3 (Transfer directory)

Content of job execution

Transfer directory which has necessary files for job execution.


Execution module a.out. Transfer output file to job execution directory.

Item

Value

Remark

Job execution directory

$MJS_CWD

Transfer directory

bin

Input file

(none)

output file

output

Destination of output file

$MJS_CWD

Job execution script


#!/bin/sh
#MJS: -proc 1 -eo
#MJS: -cwd
#BEFORE_R: bin
srun ./bin/a.out
#AFTER: output

Copyright (C) RIKEN, Japan. All rights reserved.


- 141 -

Transfer input directory

Host: Login Server

Host: node 0 (rank 0)

$MJS_CWD

$MJS_CWD

bin

bin
transfer

Transfer output file

Host: Login Server

Host: node 0 (rank 0)

$MJS_CWD

$MJS_CWD

output

output
transfer

Copyright (C) RIKEN, Japan. All rights reserved.


- 142 -

1.2 Execute parallel job


1.2.1 sample 1 (16 cores in parallel job)

Content of job execution

Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores
in parallel job. Transfer output file of each rank to Job execution directory.
Item

Value

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

input.0 - input.15

output file

output.0 - output.15

Destination of output

Remark

Input file is different with


each rank.
Output for each rank

$MJS_CWD

file

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#<BEFORE>
# a.out
# 0: input.0, input.1, input.2, input.3, input.4, input.5, input.6,
input.7
# 8: input.8, input.9, input.10, input.11, input.12, input.13, input.14,
input.15
</BEFORE>
mpirun ./a.out
#<AFTER>
# 0: output.0, output.1, output.2, output.3, output.4, output.5,
output.6, output.7
# 8: output.8, output.9, output.10, output.11, output.12, output.13,
output.14, output.15
</AFTER>
Copyright (C) RIKEN, Japan. All rights reserved.
- 143 -

Transfer input file


Hostnode 0(rank 0 - 7)
$MJS_CWD
a.out
Host: Login Server
$MJS_CWD
input.
a.out

input.
0

transfer

Hostnode 1(rank 8 - 15)

input.

..

$MJS_CWD
transfer

a.out

15

..

input.

input.

input.

input.

..
.

input.

..

15

Transfer output file


Host: Login Server

Hostnode 0(rank 0 - 7)

$MJS_CWD
output.
0

..
.

$MJS_CWD

output.
7

output.
transfer

..
.

output.
7

Hostnode 1(rank 8 - 15)


$MJS_CWD
output.
8

..
.

output.
15

output.
transfer

..
.

output.
15

Copyright (C) RIKEN, Japan. All rights reserved.


- 144 -

1.2.2 sample 2 (Use FTL variable)

Content of job execution

Use FTL variable ($MPI_RANK) for 1.2.1 sample 1 (16 cores in parallel job) case.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

input.0 - input.15

output file

output.0 - output.15

Destination of output file

$MJS_CWD

Input file is different


with each rank
Output for each rank

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#<BEFORE>
# a.out, input.$MPI_RANK
</BEFORE>
mpirun ./a.out
#<AFTER>
# output.$MPI_RANK
</AFTER>

Transfer input file

It is the same as 1.2.1 sample 1 (16 cores in parallel job)

Transfer output file

It is the same as 1.2.1 sample 1 (16 cores in parallel job)

Copyright (C) RIKEN, Japan. All rights reserved.


- 145 -

1.2.3 sample 3 (Transfer files of same file name avoiding overwriting)

Content of job execution

Execute MPI execution module a.out of 16 cores in parallel job. Transfer output files of the same
name on each rank to Job execution directory avoiding overwritng.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

(none)

output file

output

Destination of output file

$MJS_CWD

1 output file per rank

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#BEFORE: a.out
mpirun ./a.out
#FTL_SUFFIX: on
#AFTER: output

Copyright (C) RIKEN, Japan. All rights reserved.


- 146 -

Transfer input file

Hostnode 0(rank 0 - 7)
$MJS_CWD
Host: Login Server
$MJS_CWD

a.out
transfer

a.out
Hostnode 1(rank 8 - 15)
$MJS_CWD
transfer
a.out

Transfer output file

Host: Login Server

Hostnode 0(rank 0 - 7)

$MJS_CWD

output.

$MJS_CWD
output
transfer

Hostnode 1(rank 8 - 15)


$MJS_CWD
output.
8

output
transfer

Output file name is added rank number (first rank of the node) in advance of file transfer.

Copyright (C) RIKEN, Japan. All rights reserved.


- 147 -

1.2.4 sample 4 (Use rank format)

Content of job execution

Execute MPI execution module a.out of 16 cores in parallel job. Transfer output file of each rank (rank
number is 3 digits) to Job execution directory.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

(none)

output file

output.000 - output.015

Destination of output file

$MJS_CWD

Output file for each


rank

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#BEFORE: a.out
mpirun ./a.out
#FTL_RANK_FORMAT: 3
#AFTER: output.$MPI_RANK

Input file transfer

It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).

Copyright (C) RIKEN, Japan. All rights reserved.


- 148 -

output file transfer

Host: Login Server

Hostnode 0(rank 0 - 7)

$MJS_CWD

output.
000

..
.

$MJS_CWD

output.
007

output.
transfer

000

..

output.
007

Hostnode 1(rank 8 - 15)


$MJS_CWD
output.
008

..
.

output.
015

transfer

output.
008

..
.

output.
015

Copyright (C) RIKEN, Japan. All rights reserved.


- 149 -

1.3 FTL basic directory (FTLDIR command)


1.3.1 sample 1 (16 cores in parallel job)

Content of job execution

Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores
in parallel job. Transfer output file of each rank to Job execution directory.

Item

Value

Remark

Job execution directory

$MJS_CWD

FTL basic directory

$MJS_CWD

Execution module

a.out

Input file

input

output file

output.0 - output.15

Destination of output file

$MJS_CWD

Output file for each


rank

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#FTLDIR: $MJS_CWD
mpirun ./a.out

Copyright (C) RIKEN, Japan. All rights reserved.


- 150 -

Transfer input file


Hostnode 0(rank 0 - 7)
$MJS_CWD

a.out

Host: Login Server


transfer

$MJS_CWD

a.out

input

input
Hostnode 1(rank 8 - 15)
$MJS_CWD

transfer

a.out

input

Transfer output file

Host: Login Server

Hostnode 0(rank 0 - 7)

$MJS_CWD/$MJS_REQID

output.
0.0

..
.

$MJS_CWD

output.
7.0

output.
transfer

..

output.
7

Hostnode 1(rank 8 - 15)


$MJS_CWD
output.
8.8

..
.

output.
15.8

output.
transfer

..
.

output.
15

Transfer only newly created files during job execution.


File name is added rank number.

Copyright (C) RIKEN, Japan. All rights reserved.


- 151 -

1.3.2 sample 2 (File collect type: mtime)

Content of job execution

Transfer input file which is necessary for each rank. Execute MPI execution module a.out of 16 cores
in parallel job. Transfer output file of each rank to Job execution directory.

Item

Value

Remark

Job execution directory

$MJS_CWD

FTL basic directory

$MJS_CWD

Execution module

a.out

Input file

input
output.0 - output.15

Newly created for each


rank

output file

Input

input

file

is

updated

during job execution


Destination of output file

$MJS_CWD

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#FTL_COLLECT_TYPE: mtime
#FTLDIR: $MJS_CWD
mpirun ./a.out

Transfer input file

It is the same as 1.3.1 sample 1 (16 cores in parallel job).

Copyright (C) RIKEN, Japan. All rights reserved.


- 152 -

Transfer output file

Host: Login Server

Hostnode 0(rank 0 - 7)

$MJS_CWD/$MJS_REQID
output.
0.0

..
.

$MJS_CWD

output.
7.0

output.
transfer

input.

..

output.
7

input

Hostnode 1(rank 8 - 15)


$MJS_CWD
output.
8.8

input.

..
.

output.
15.8

output.
transfer

..
.

output.
15

input

Copyright (C) RIKEN, Japan. All rights reserved.


- 153 -

1.4 Others
1.4.1 Execute job using temporary directory

Content of job execution

Execute MPI execution module a.out of 16 cores in parallel job.

The a.out needs tmp directory in job

execution directory of each rank.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

(none)

output file

output

Destination of output file

$MJS_CWD

1 output file for each


rank

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#FTL_MAKE_DIR: $MJS_CWD/tmp
#BEFORE: a.out
mpirun ./a.out
#AFTER: output

Copyright (C) RIKEN, Japan. All rights reserved.


- 154 -

Make directory

Hostnode 0(rank 0 - 7)
$MJS_CWD
Host: Login Server

tmp
make

Hostnode 1(rank 8 - 15)


$MJS_CWD
make
tmp

Transfer input file

It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).

Transfer output file

It is the same as 1.2.3 sample 3 (Transfer files of same file name avoiding overwriting).

Copyright (C) RIKEN, Japan. All rights reserved.


- 155 -

1.4.2 Execute job using meta character

Content of job execution

Transfer the same input file to each rank. Execute MPI execution module a.out of 16 cores in parallel
job.

Item

Value

Remark

Job execution directory

$MJS_CWD

Execution module

a.out

Input file

input.0 - input.15

output file

output

Destination of output file

$MJS_CWD

The

same

input

file

is

necessary for each rank


only on MPI master node

Job execution script


#!/bin/sh
#MJS: -proc 16 -eo
#MJS: -cwd
#<BEFORE>
# a.out, input*
#</BEFORE>
mpirun ./a.out
#AFTER: 0@$MJS_CWD:output

Copyright (C) RIKEN, Japan. All rights reserved.


- 156 -

Transfer input file

Hostnode 0(rank 0 - 7)
$MJS_CWD
a.out

..

input.
Host: Login Server

input.
15

$MJS_CWD
transfer
a.out

input.
0

..
.

input.

Hostnode 1(rank 8 - 15)

15

$MJS_CWD

transfer

a.out

input.

..

input.
15

Transfer output file

Host: Login Server

Hostnode 0(rank 0 - 7)

$MJS_CWD

$MJS_CWD

output

output
transfer

Copyright (C) RIKEN, Japan. All rights reserved.


- 157 -

You might also like