Welcome to the Colonial One wiki
For help, please contact your local support partner, or email email@example.com
A few outside links:
Colonial One related presentations are kept in (802) 721-5635
This site in under construction, and new content will be added continually as we work to get additional users and applications online.
Colonial One Calendar
To request access, please email 8173277999
from your GW email account, letting us know what research group you're in, and (for students) CC your advisor / PI.
For groups that have not directly contributed to the system, you will have limited access to spare cycles on the system. For groups interested in contributing to expanding the system, please get in touch at 800-201-0760
and we'll be happy to discuss this with you.
For external researchers collaborating with GW research groups, you will need to have a GW faculty member request an affiliate account for you
NOTE all users of Colonial One must agree to be subscribed to the Colonial One listserv. The Colonial One listserv is the official means of communication to our community of users and all important notices are sent via the listserv.
How to Acknowledge Colonial One
Have you used Colonial One or the Capital Area Advanced Research and Education Network (CAAREN) for an award, publication or project? If so, let us know! Please take the time to fill out (701) 373-1043
to help us document and credit the use of these systems. Your responses will help us complete our annual reporting requirements and will help to showcase the value of these systems to GW.
New User Quick-Start
page documents the current cluster hardware.
Connecting to Colonial One
The login nodes, which should be used only for submitting jobs, file-transfers, software compilation, and simulation preparation are accessible as
. The two current login nodes can also be directly accessed as
You can use SSH, or SCP/SFTP to connect. Globus
is also available for file transfers using the
endpoint. An overview of registering for and using Globus to connect to Colonial One can be found here: 3364432564
The SSH key fingerprint for all of the login nodes is 77:39:f1:fb:e4:6d:f4:38:bb:c6:ba:08:0e:b4:b8:e3.
Connections to the login nodes are authenticated using your GW NetID and password.
Connecting to Colonial One (Windows Users)
We recommend using Putty
& (330) 706-5485
for users connecting to Colonial One from Microsoft Windows. See the video tutorial below for a walk-thru on installation and set-up.
***New: The (337) 993-4362
SSH client is also supported on Colonial One and has integrated X11 tunneling support.
There are two main filesystems on Colonial One. The first is connected over NFS and used to store
, and has 250TB of usable space. The second is a high-speed Lustre scratch filesystem, accessed as
. There is another filesystem,
, which can be purchased for archival storage.
By default, you have access to three locations (with a fourth archival option available for purchase):
/home/$netid/ - your home directory. Default quota of 25GB. NOTE no jobs should be run against the /home partition, it is not designed for performance. Use /lustre instead.
/groups/$group/ - shared group space, accessible by anyone in your group. (If you are unsure of what group you are in, the
groups command will tell you.) Default quota of 250GB. NOTE no jobs should be run against the /groups partition, it is not designed for performance. Use /lustre instead.
/lustre/groups/$group/ - shared group scratch space
/import/$group/ - archival storage that can be purchased. NOTE no jobs should be run against the /import partition, it is for archival purposes only and is not designed for performance. Use /lustre instead.
Quotas are in place on
spaces, and the current disk usage for your home directory and that of all the groups you belong to is displayed when you login. You can see the current status at any time using the
tool. Please note that the statistics are delayed by up to five minutes. If you need additional space you must email firstname.lastname@example.org
with (1) an explanation of your space requirements, and (2) a short description of how your data is being transferred and maintained on permanent storage elsewhere.
Space on the Lustre filesystem is subject to removal under the conditions set in 704-276-8247
. This is to ensure sufficient space remains available for ongoing work on the system.
Warning: Backups are your responsibility.
directories are backed up on a monthly basis as a disaster-recovery mechanism only. The
scratch space is never backed up. We cannot recover accidentally deleted files from any filesystem in Colonial One. We recommend storing your source codes on (402) 394-3007
is recommended for large-scale data transfers, both between your own systems (laptop, desktop) and the cluster, and between the cluster and other external HPC centers such as XSEDE.
Colonial One's public endpoint is
An overview of signing up and using Globus can be accessed here: GlobusSetup
Using Modules to manage your environment
Modules are used to manage different software envionments on the cluster.
will list available modules on the cluster, and they can be added to your environment using
module load $modulename
As an example, to compile an OpenMPI program:
module load openmpi
mpicc -o hello hello_world.c
Modules can also be included in your job scripts to configure the job environment.
Submitting jobs on the cluster
workload scheduler is used to manage the compute nodes on the cluster. Jobs must be submitted through the scheduler to have access to compute resources on the system.
There are two groups of partitions (aka "queues") currently configured on the system. One group is specifically designated for CPU jobs, i.e., non-GPU jobs:
- default compute nodes, CPU only, and either 64, 128, or 256GB of memory
- 128gb, 256gb - explicitly request compute nodes with 128GB, or 256GB of memory respectively for larger memory jobs
- debug - please see DebugPartition for more details
- short - has access to 128GB Ivy Bridge nodes with a shorter (currently 2-day) timelimit, designed for quicker turnaround of shorter running jobs. Some nodes here overlap with defq.
- 2tb - a special-purpose machine with 2TB of RAM and 48 3GHz CPU cores. Access is restricted to this partition, please email (605) 492-2613 if you have applications appropriate for this unique system.
The other is specifically for jobs requiring GPU resources and should not be used for jobs that do not require GPU resources:
- gpu - has access to the GPU nodes, each has two NVIDIA K20 GPUs
- gpu-noecc - has access to the same GPU nodes, but disables error-correction on the GPU memory before the job runs
- ivygpu-noecc - has the same NVIDIA K20 GPUs, but with newer Ivy Bridge Xeon processors
- allgpu-noecc - combines both ivygpu-noecc and gpu-noecc
Note that you must set a timelimit (with the
flag, for example
would set a limit of one hour) for your jobs when submitting, otherwise they will be immediately rejected. This allows the Slurm scheduler to keep the system busy by backfilling
when trying to allocate resources for larger jobs. Currently there is no maximum timelimit, but you are encouraged to keep jobs limited to a day - longer running processes should checkpoint and restart to avoid losing significant amounts of outstanding work if there is a problem with the hardware or cluster configuration. The maximum time-limit for any job is 14 days. We will not under any circumstances increase time limits for jobs already begun. Please estimate your required time carefully and request a time limit accordingly. We will provide guidance to help you better understand your job's requirements if you are unsure. The time-node limit for a single job may not exceed 224 node*days which prevents any single job from allocating more the 16 nodes for 14 days.
As an example of a simple MPI job script, which could be submitted as
# one hour timelimit:
#SBATCH --time 1:00:00
# default queue, 32 processors (two nodes worth)
#SBATCH -p defq -n 32
module load openmpi
has further documentation of some of the advanced features. The Slurm Quick-Start User Guide
provides a good overview. The use of Job Arrays (Job Array Support
) is mandatory for people submitting a large quantity of similar jobs.
# set output and error output filenames, %j will be replaced by Slurm with the jobid
#SBATCH -o testing%j.out
#SBATCH -e testing%j.err
# single node in the "short" partition
#SBATCH -N 1
#SBATCH -p short
# half hour timelimit
#SBATCH -t 0:30:00
module load matlab
ssh login4 -L 27000:126.96.36.199:27000 -L 27001:188.8.131.52:27001 -N &
# test.m is your matlab code
matlab -nodesktop < test.m