You are on page 1of 11

Running Fluent on CARES

Graduate Students
Clarkson University,Potsdam,NY
December 19, 2007

Abstract
This manual is intended as an aid to help graduate students gain
access to the high performance computational cluster, CARES. The
first section will introduce new users to the terminology and commands
used on the cluster, as well as where to get the proper programs and
files. Fluent specific files are discussed in section two, including files
needed to run Fluent in batch format and files generated by Fluent
to monitor solutions. Section three introduces the queue system used
to submit jobs to the cluster, as well as how to monitor your jobs
on the queue. This manual finishes with some information on how to
add/modify this document.
This document is a work in progress. If there is anything that is
unclear or a specific aspect which you feel deserves more discussion,
please make the changes. You can find the *.tex file (here) Thanks a
bunch for reading, best of luck!!!

1
1.1

Basics
User Account

The first step to using the CARES cluster is to contact Dr. Brian Helenbrook. Once your access has been approved you will need to fill out paper
work at the Clarkson Help Desk to create an account. The Help Desk is located on the second floor of the ERC. Josh Fiske is in charge of the cluster,
but you wont be dealing with him directly at this time. All that is necessary
is to fill out the paper work for the username and password. Without a user
account you will not be able to access the cluster. Expect to wait a couple
of days before your account is activated.

1.2

SSH

The cluster is using a Secure Shell (SSH). There are several software packages that can be used to access the cluster. The SSH Secure Shell Client
(download SSHSecureShellClient-3.#.#.exe here) is an exceptionally good
one, with both a GUI and TUI combined with a file transfer protocol (ftp).
Another useful package is Putty, but this requires a separate ftp.
Another equally fine SHH package is WinSCP which can be found readily
online or from Clarksons OIT software page, here). This is a nice package
that is freely available under the GNU General public license, and is what
the current author now prefers, along with PuTTy for a text command
interface. (These programs can be stored on a flash drive and used at almost
any location as well.)
The next step is to set-up whichever SSH client to access the CARES
cluster. if using SHH Secure Shell Client Under the Edit->Settings>Tunneling panel check the TunnelX11 connections box. Next, connect
to remote host via the Connect or Quick Connect menu buttons. Within
the pop-up window type in Host Name: cares.clarkson.edu, followed by
your user name and click the Connect button. An additional window will
prompt for your password. If all went well, congratulations, your on the
cluster. if using WinSCP Create a new session, save it as CARES or
something. Specify you host name as cares.clarkson.edu and your user
name in the initial session information panel. Be sure the file protocol is set
to SFTP. Save your session information and click the Login button to access
your account on the cluster.

1.3

UNIX Commands

Once access to the cluster has been established, you need to know some
basic Unix Commands. Many of this commands can be found by simply
searching the web for Unix commands.

File Commands
cd
change directory
clear
clear screen
cp
copy files
info
information on commands
ls
list files in directory
man
manual on commands
mkdir
make directory
rm
remove file/directory
rmdir
remove empty directory
qdel
delete job
qstat
status of running jobs
du -h -s
memory used
Program Commands
nano
text editor
mozilla
browser to view queue jobs (requires EXCEED)
pico
text editor
qmon
GUI status of running jobs (requires EXCEED)
vi
text editor
Perhaps the two most useful commands for those first learning how to use
Unix commands are man and info. The majority of Unix commands
have built-in descriptions, instructions for use and details on options. For
particulars on for any command simply type man cd, info nano, man
vi, man COMMANDNAME, etc., in the command line. To exit from
these help documents, type q.
A good practice is to create a directory in your home directly (labeled
as your user name), to hold your .cas, .dat,.flin, and .bash files. These
files are discussed in Section 2. Maintaining a logical file folder structure
can save headaches later on.

1.4

Blades and Nodes

There is some terminology used in the cluster community. The cluster is


currently composed of 42 blades and 84 nodes. This is analogous to 42 dualprocessor computers, where the blades are the computers and the nodes
are the processors. Each blade can run two different jobs, one node to each
job. The blades are named (i.e. COMPUTE-0-1) and the nodes are not, so
each blade contains 2 nodes under the same name.
In parallel processing more than one node is used to run one job. Then
the nodes are broken into a HOST node and SLAVE nodes. The host node
3

governs the parallel computing, taking data from different slaves and passing
it to the slaves. The host node itself does not contain any data, just the
means to pass the data from slave to slave. This is important to know if
you plan to use UDFs on the cluster.

1.5

What Will I Gain?

The basic idea of parallel computing is that if a large computation can be


split up between multiple computers the amount of time required to obtain
a solution will be reduced. Thus, the first and obvious benefit is a solution
to your problem faster. The ability to run parametric studies with relative
ease in batch mode is another skill which can be gained through the proper
use of the Fluent text commands. And finally, you free up one additional
computer (your own desktop!!) to complete other work.
A few tests have been conducted on the Clarkson CARES Cluster that
show up to a four-fold increase in time to solution for large Fluent models
on 6 nodes. This particular example was a 1.2 million 3d cell, multi-phase
simulation; the majority of Fluent cases will have substantial computational
speed increases with 2 or 4 nodes. Typically, three to four times faster than
on a single processor PCU is the upper limit of what you can expect to
achieve.
Note that Fluent does not run effectively on more than 8 nodes. The
amount of message passing between nodes slows down the computational
speed above this limit. So dont choose 8 nodes to run your simulations.
Your mesh would have to be upwards of 5 million cells to justify the use of 8
nodes. If your geometry is composed of this many cells, consider simplifying
it and reducing the cell count, especially if your a new user to Fluent. It
would take, even to run a steady case, a considerable amount of time. Initially, a new user should start with simple geometries and meshes and work
up to their final configuration.

Important Files

In addition to the standard Fluent .cas and .dat files, there are two files
which are used to open Fluent and define the commands to be input to the
open program. You will find it advantageous to place all of these files in the
same directory.

2.1

.bash

The .bash file is used to open the Fluent program, create the error and
output files, define the number of nodes to be used, and specify the file
where the Fluent commands are located. An example fluent.bash file has
been created and annotated by Dr. Helenbrook and can be found here. The
majority of this file does not to be altered for standard Fluent usage. Please
take the time to read the comments within this file and try to understand
what is being done. The lines that do need to be altered for your specific
usage are as follows:
fluent.bash Important Lines
Line 6
#$ -N my test job
Change my test job to a short description of your case.
Line 22
o output
Line 23
e errors
OPTIONAL Set output and errors file names.
Line 27
#$ -l h rt=01:31:00
Set walltime large enough to ensure obtaining a solution.
Line 30
#$ -pe fluent pe 2
Line 48
. . . fluent pe $NSLOTS -pgmpi. . .
The number of nodes desired must be set the same in both lines.
i.e. replace $NSLOTS with 2 in this example.
Line 48
. . . -g -i <input.file>
Replace <input.file> with .flin file name.
More often than not first-time users make mistakes in this file. It is
important to be sure that the final line of code is on a single line. The
second line occurs in the example because of the limited screen width in
most text editors. Turn off word wrapping or other default settings in
your text editor to correct this. Or simply ask someone to run you through
the process, there is always someone using Fluent on the cluster. Track
them down.

2.1.1

Last Line of the .bash File

As can be noted from the previous table, Line 48 of the code is quite important. Two examples are shown here to stress this lines importance.
Remember that this code needs to be a on a single line in your .bash file,
the dual-line usage here is due to limited page width.
Example 1: 2d Double Precision Solver Set to Run on Four Nodes
$FLUENT INC/bin/fluent 2ddp -sge -sgepe fluent pe 2 -pethernet -cnf=$TMPDIR/machines
. . . -mpthostfile=/etc/hosts for fluent -ssh -g -i TwoD DoubPrec.flin
Example 2: 3d Single Precision Solver Set to Run on One Node
$FLUENT INC/bin/fluent 3d -sge -sgepe fluent pe 1 -cnf=$TMPDIR/machines
. . . -mpthostfile=/etc/hosts for fluent -ssh -g -i OneD SinglePrec.flin
The first text in Line 48 specifies the location of Fluent and should not be
changed. Once Fluent is called it needs to know which version it should be running
(2d, 2ddp, 3d or 3ddp). The next important change to note is where the number
of nodes are specified, following fluent pe. For jobs to be run in parallel, this
number needs to be the same as specified on Line 30. The command -pnmpi tells
Fluent to use networking message passing, a slower but more stable message passing
procedure suitable for unsteady Fluent calculations. For single node jobs, remove
the message passing command and set the number of nodes to 1. The final note on
this line is the very last command, the name of the file where you have stored the
Fluent text commands to run your simulation in batch mode. This is your .flin
file.

2.2

.flin

The .flin file is the last entry in the .bash which executes the Fluent program.
This is a text document containing the sequential commands you wish Fluent to execute. These commands have the same format that would be entered in the screen
of the Fluent if one was using text user interface (TUI) in Windows. For example,
to read a case file, initialize, iterate (unsteady) and write the case and data the
.flin file would contain. . .
/file/read-case fluent.cas
/solve/initialize/initialize-flow
/solve/dual-time-iterate 500 20
/file/write-case-data fluent

read case
initialize flow
iterate 500 timesteps, 20 iterations/timestep
write case and data

Instead of /file/read-case, abbreviations like /fi/rc can also be used. To


ensure these commands will work it is good practice to first test them on your own
workstation, using either the TUI or Fluent journal files. Journal files are identical
in structure to .flin files, simply a list of the commands you wish to use. It is very
good practice to ensure your commands work before using them on the cluster;

nobody wants their simulation to crash after its been running for days because
they forgot a space or comma!!
Further information on writing journal files and accessing data via text commands can be found in the Fluent literature. A short list of text commands that
you are likely to encounter/desire are included in the last section of this manual,
as well as short descriptions of their use.

2.3

.cas and .dat

The case and data files are easier to create on your workstation and later move
onto the cluster. With SSH Secure Shell Client or WinSCP you simply use the file
transfer GUI. From here it is only a matter of dragging the files from your computer
to the CARES directory of your choice.
It is also advantageous for you to run the case file for several iterations or
timesteps (depending on your formulation) on your machine to be sure that the
case will run as planned.
The cluster cannot run any graphics, so dont try to plot any figures. If you do,
errors transpire and youll have to delete the job (see Section 3.3). This includes
animations, residuals and monitors. If you want to make animations, write the
data files and use TecPlot on your PC. Your output file is where you will monitor
the convergence of your simulation.

2.4

output and errors

The output file prints what would be viewed in the TUI of Fluent if you were
running on a workstation. This information will also give more specific insight into
where errors occurred or what the error was. Changing the output file name to
include some basic case information and a Windows acceptable extension can be
beneficial (i.e. Output Wing.txt2).
The error file provides information about any errors that occur. By default
there are certain things printed in this file, so do not be alarmed if text is written
here.

2.5

cleanup-fluent-sge-Process-ID

This is a file created by Fluent/CARES to remove processes from the nodes once
your simulation has finished. If your process is terminated prior to the desired
completion (i.e. you exceed your wall time, an error occurs, etc) this file will not be
removed and you must delete it manually. Be forewarned, this is a good indication
that you have processes on the nodes still taking up space that need to be manually
removed as well (See Section 3.3). If everything goes as planned, this file will be
removed at the end of your simulation and will clean off the node processes as well.

3
3.1

Jobs
Submitting a Job

Once the .cas, .dat, .flin and .bash files have been created, tested and placed
within a single directory you are ready to submit your job to the cluster. To submit
the job simply type
qsub file.bash
in the directory where your files are located. After a few seconds output and
error files should be created with the set-up information (type ls to list directory
contents). The output file will have a table of the nodes used at the top along with
the Job-ID and other setup information. The error file will have a few lines of setup information (to view these files, type nano filename). This step usually occurs
without incident. Occasionally, the submission will fail for no apparent reason. If
the error file says FLUENT could not be found or Cannot log into node-X-X
without password there is a good chance that your submission has failed. You
should check the status of your jobs when this happens.
When errors described in the last paragraph occur during setup there is little
that can be done but 1) removing the new files (error and output) from your
current directory 2) checking to make sure that you have no hanging nodes and
3) resubmitting the job. Sometimes waiting until certain nodes are in use by other
users helps; sometimes changing the number of nodes the case is set to run on helps.
Try not to get discouraged.

3.2

Monitoring a Job

Once you have submitted a job to the cluster you need to know what its status is.
Type qstat to see the status of all the jobs in the queue. You should see your job
along with other users jobs in a list. If you choose to look at only your jobs, qstat
-u username will show you your jobs only. qstat -f will print the full of the
queue information in a more readable format. The status of the job is indicated in
the fourth column of data, under the heading of state.
Directly after submission the job will be in queue (qw). Once the initial file
has been read in the status changes to t, or transferring the information to the
nodes. If this is successful, the status will change to r, or running. If the status
returned is Eqw, there was an error while in queue, typically this is a problem
with your user account and should contact the member in charge of the cluster.
The first column of data printed with qstat is the process identification number
(PID). This is an important number, each process has its own. This is the same
number that ends the file name of the kill file created when you submitted the
job. The third column of the qstat print out is the name you specified in Line 6 of
your .bash file. The PID and name of the process will occur in the queue list once
for processes running on only one node. When jobs are run in parallel there will
be one listing of the process name for each salve and master node used (# nodes

requested + 1). Column 8 lists the blade on which the job is running. If column 8
is not fully shown, try the qstat -f command.
Not all processes are shown in the queue status. These hanging nodes are
processes that have ended improperly on the cluster. These jobs can continue to
utilize resources on the cluster, often taking computational ability or storage space
from other users. This is one of the few ways other users can effect the work of
others on the cluster. To obtain a list of all the processes running on the cluster
under your username the command
cluster-fork ps -U username
can be used (dont forget the ). cluster-fork is an extension of the forking
command which I dont know much about, but it allows commands to run on all
the blades. The above example sends the command ps (report process status) to
all of the blades with the username option specified. Compare the blades reported as
being occupied by cluster-fork and qstat -f, any process showing up with clusterfork and not with qstat -f is a hanging job and should be cleaned (deleted from
the queue).

3.3

Deleting a Job

Deleting a job is somewhat more difficult than submitting them. To stop a running
simulation type
qdel PID
This will delete the job from the queue. The PID can be found using qstat and in
the output file. It is also listed on the end of the cleanup-fluent-sge-PID file
Fluent/CARES creates in your directory. The major difficulty lays with hanging
nodes, which will only be a concern in parallel processing. After deleting jobs from
the queue you should check to see if the node was properly cleaned off using the
cluster-fork ps username command. If there are processes still running, you
will need to connect directly the the effected blade.

3.3.1

Connecting Directly to Blades

To connect directly to a blade type


ssh compute-x-x
where compute-x-x is the effected blade name. Once you are connected you can
view the processes running under your name with ps -u username. By default
there will be 3 processes on each blade (sshd, bash, and ps); removing these processes will terminate your connection to the blade. Any additional processes running are hanging jobs and need to be removed.
To delete individual processes type
kill -9 PID

where the PID will be the individual process identifications in the first column of
data listed from the ps -u username command. Do not worry about sharing
a blade with another user and killing their jobs, you do not have access. Know
which nodes you are running on, if you delete the output file it is more difficult
to determine which nodes you are using.
Another effective way to delete this the process is to ssh to the host node,
labeled as MASTER with qstat or Host in the output file. After ssh-ing to the
host node type
killall ssh
This seems to delete all pertinent processes on the nodes that were associated with
your job. After the first couple of times, check the other nodes. To exit the blade
properly use the exit command and you will return to your CARES account.
To completely clean the cluster of all your jobs the command
cluster-fork killall ssh
will remove all processes, sending a killall command to all of the blades. This will
remove all jobs (hanging and not hanging) from the cluster, so make sure to only
use this when there are no jobs of interest running.

Additional Material

4.1

How to Improve this Document

Just to reiterate, this document is intended on being updated periodically and with
this upkeep will hopefully become a useful document. This reworking is required
because
The CARES Cluster is slated to be upgraded in several ways.
Software continually gets updated.
Engineers are traditionally poor writers.
If it is not clear, the learning curve to cluster usage can be steep and painful.
Improper cluster usage can negatively affect current users.
Any improvement or complaints are welcome. If you wish to alter this document
directly the LATEX files used to generate the document are readily available from
the site from which you downloaded the document, (here). If your interested on
learning how to write documents in LATEX, please feel free to look through this file
as well.

10

4.2

Storage Space

At the present time space on cluster is at a premium. An expansion of space is


planned, but even after this occurs please be aware of the repercussions of using
too much space. If the storage space becomes filled data files cannot be written by
anybody using the cluster. This is one of the few ways your cluster activities can
effect other users.
As a rule of thumb, remove files/data that is no longer needed to run your
current jobs. Keeping the amount of space you use under 10GB will allow everybody
to have equal access to the CARES cluster. To determine the amount of space you
are currently using, navigate to your home directory and type
du -s -h
which is the command disk usage with the short and human-readable options. If
you go over 20GB you will receive a warning and your CARES cluster usage will
be examined. The cluster is not intended to be used as a storage facility.

4.3

A Short List of FLUENT Text Commands

This is just a short list of commands, the best way to learn how to use a command
within the batch command structure of the CARES cluster is to use these commands in the TUI of Fluent while running on your desktop. Fluent documentation
is extremely helpful as well.
Basic Setup and Checking Commands
/file/read-case Section4
Read Section4.cas (file-extension not reqd in command)
/file/rc Section4
Same as above, abbreviated version
/grid/check
Standard grid check
/grid/reorder/reorder-domain
Reorder grid domains
/g/r/rz
Reorder grid zones (Abbreviated)
/report summary yes SUMMARY.txt2
Write full case summary
Solution Control Commands
/solve/set/ur mom 0.8
Momentum under-relaxation factor
/solve/set/ur press 0.5
Pressure under-relaxation factor
/solve/monitors/residual/convergence-criteria 0.0001 0.001 0.001 1e-6
Set the residual convergence values for continuity x-vel y-vel and energy
/solve/set/time-step 5e-6
Set time-step value for unsteady calculations
/solve/dual-time-iterate 50 60
Iterate for 50 steps with a max 60 iteration/time-step
/solve/iterate 300
Iterate maximum 300 times (steady)
Data Writing Commands
/report/sa vin outflow () ()
/report/surface-avg () pressure ()

11

Surface-average of mass-flow at vin and outflow


Pressure average at all surfaces

You might also like