Data (Hexagon): Difference between revisions

From HPC documentation portal
m 56 revisions
No edit summary
 
(38 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Available file systems ==
== Available file systems ==
The following file systems exist on hexagon. The hexagon nodes do not have internal disks, therefore there are not any /scratch like areas available.
Available file systems on Hexagon are:
* /home
* /work
* /shared
 
For details about each storage area, see documentation down below.
 
''Note'': Hexagon nodes do not have internal disks, thus there is no local /scratch file system available on any of the nodes.
 
Data handling and storage policy can be found [[Data_Handling_and_Storage_Policy|here]].


=== User area (home directories): /home ===
=== User area (home directories): /home ===
The file system for user home directories on hexagon. This file system is currently very small, and not mounted on the compute nodes, so it should NOT be used for running jobs. It has quota enabled, limits can be found [[Data (Hexagon)#Default quota per user|here]] . Files are backed up daily, except for folders called "scratch" or "tmp" and their sub-folders.
/home is the file system for user home directories ($HOME) on Hexagon.
 
This file system is relatively small and not mounted on the compute nodes, so it can ''NOT'' be used for running jobs. It has quota enabled, limits can be found [[Data (Hexagon)#Default quota per user|here]] . Files are backed up daily, except for folders called "scratch" or "tmp" and their sub-folders.
 
=== Work area (temporary data): /work/users ===
/work is a large external storage shared by all compute nodes on hexagon. Files are ''NOT'' backed up.
/work is a Lustre file system where the user can specify stripe size or stripe count to optimize performance (see examples in [[Data (Hexagon)#Management of large files]]).  


=== <div id="Work">Work area (temporary data): /work/users</div> ===
Only /work and /shared should be used when running jobs since there is no /home or any local scratch file system available on the compute nodes.
Large external storage shared by all compute nodes on hexagon. Files are NOT backed up. /work should be used when running jobs since it's much larger than /home and the only file system available on the compute nodes. For convenience, a symlink from /work/$USER to /work/users/$USER is created. /work is a Lustre file system where the user can specify stripe size or stripe count to optimize performance (see examples in [[Data (Hexagon)#Management of large files]]).  
Overview and comparison between /work and /shared can be found [[Data_(Hexagon)#Overview_between_.2Fwork_and_.2Fshared|here]].


'''Note:''' as of Oct. 1st 2009 the /work/users/* directories are subject to automatic deletion dependent on modification, access time and the total usage in the file system. The oldest files will be deleted first. You can find more information about deletion policy in [[Filesystems policy(Hexagon)]]
'''Note:''' The /work/users/* directories are subject to automatic deletion dependent on modification, access time and the total usage of the file system. The oldest files will be deleted first. You can find more information about deletion policy in the [[Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork|Data handling and storage policy]] document.


=== Project area: shared space for a project /work/shared (upon request) ===
=== Shared storage: /shared (upon request) ===
Part of the Lustre /work filesystem dedicated to specific projects that need semi-permanent input data used by many across a group. Not part of the automatic deletion described above. Similar to /work/users it is NOT in backup.
/shared is a Lustre file system available on the compute nodes. This file system is intended for projects and other shared data between users and projects.  
=== Project area: shared space for a project /work-common (upon request) ===
Please follow the link for more information [[Work-common]].


=== Archiving filesystems /migrate and /bcmhsm ===
/shared is a "permanent" storage. It is not subject to automatic deletions (like /work/users), but it has '''no backup'''. For comparison between /work and /shared see [[Data_(Hexagon)#Overview_between_.2Fwork_and_.2Fshared|here]].
'''migrate''' -- the area for archiving data, i.e., for the automatic migration of data between disk and tape. Can be used upon request only. Transfer of data to or from /migrate is only meant for users that have large collections of data (tens of Gigabytes or more, typically the result of long simulations) that need to be archived and that cannot be stored in the home directories.


'''/bcmhsm''' -- the area for archiving data, i.e., for the automatic migration of data between disk and tape for Bjerknes. Can be used upon request only. Transfer of data to or from /bcmhsm is only meant for users that have large collections of data (tens of Gigabytes or more, typically the result of long simulations) that need to be archived and that cannot be stored in the home directories.
The maximum space a project can request is 10TB. There is an option to purchase extra storage nodes by the project or group to have this limit extended. Please [[Support|contact us]] for more details. The groups on this filesystem are enforced according to project quota allocations by nightly cron job.


=== What area to use for what data ===
=== What area to use for what data ===
Line 24: Line 36:


'''/work/users''' should be used for running jobs, as a main storage during data processing. All data after processing must be moved out of the machine or deleted after use.
'''/work/users''' should be used for running jobs, as a main storage during data processing. All data after processing must be moved out of the machine or deleted after use.
'''/shared'''  should be used for sharing data between projects, permanent files and running jobs.


=== Policies wrt deletion of temporary data ===
=== Policies wrt deletion of temporary data ===
'''/work/users''' has automatic deletion policy. Deletion starts when file system usage is above 65%. The script will delete files based on age and last access - prioritizing oldest files for deletion first. Files newer than 7 days will not be deleted.
Please find detailed policies [[Data_Handling_and_Storage_Policy#Scratch_area:_.2Fwork|here]].


Since this deletion process (as well as the high disk usage percent) will take away disk-performance from the running jobs - the best solution is of course for you to remember to clean up after each job.
=== Overview between /work and /shared ===
{|border="1" cellspacing="0" style="text-align:center;"
! Feature !! /work !! /shared
|-
|  style="text-align:left;" | Available on the compute nodes || Y || Y
|-
| style="text-align:left;" | Automatic deletion || Y || N
|-
| style="text-align:left;" | Backup || N || N
|-
| style="text-align:left;" | Group enforcement || N || Y
|-
| style="text-align:left;" | Performance || Best || Slightly worse
|-
| style="text-align:left;" | Quota || N || Y
|}


== Transferring data to/from the system==
== Transferring data to/from the system==
Line 42: Line 71:
Please have a look at [[Getting started]] section for list of programs and examples.
Please have a look at [[Getting started]] section for list of programs and examples.


=== High-performance tools ===
== Disk quota and accounting ==
Default ssh client and server on hexagon login nodes is openssh package with applied HPN patches<ref name="HPN-SSH">[http://www.psc.edu/networking/projects/hpn-ssh/ HPN-SSH], High Performance SSH/SCP - HPN-SSH project page.</ref>. By using a hpnssh client on the other end of the data transfer throughput will be increased.


To use this feature you must have a HPN patched openssh version. You can check if your ssh client has HPN patches by issuing:
=== Default quotas ===
ssh -V
==== User quotas ====
if the output contains the word "hpn" followed by version and release then you can make use of the high performance features.
By default you get a soft quota and a hard quota for your home directory.  


Transfer can then be speed up either by disabling data encryption, AFTER you have been authenticated or logged into the remote host (NONE cipher), or by spreading the encryption load over multiple threads (using MT-AES-CTR cipher).
'''[[Data (Hexagon)#User area (home_directories): /home|/home]]'''  has 20GB hard limit and maximum 1M files quota enforced per user.
Grace time is set to 7 days. If the soft quota is exceeded for more than 7 days, or the hard quota is exceeded, you will not be able to create new files or append to files in your home directory.


==== NONE cipher ====
You can check your disk usage (in KB), soft quota (quota), hard quota (limit) and inodes (limit) with the command:
This cipher has the highest transfer rate. Keep in mind that data after authentication is NOT encrypted, therefore the files can be sniffed and collected unencrypted by an attacker.
  quota -Q
To use you add the following to the client command line:
  -oNoneSwitch=yes -oNoneEnabled=yes
Anytime the None cipher is used a warning will be printed on the screen:
"WARNING: NONE CIPHER ENABLED"
If you do not see this warning then the NONE cipher is not in use.


==== MT-AES-CTR ====
OR in human readable form with:
'''NOTE:''' Currently disabled, until upstream issue will be resolved.
quota -Qs


If for some reason (eg: high confidentiality) NONE cipher can't be used, the multithreaded AES-CTR cipher can be used, add the following to the client command line (choose one of the numbers):
'''Note''': Intermediate files with STDOUT and STDERR of running jobs are placed on the /home file system. If a job has a lot of output to STDOUT/STDERR, it is recommended to redirect it in the script directly to /work filesystem (i.e.: ''aprun .... executable >& /work/users/$USER/somelog.file'') instead of using the "#SBATCH --output" switch. See [[Job execution (Hexagon)#Relevant examples]].
-oCipher=aes[128|192|256]-ctr
or
-caes[128|192|256]-ctr.


== Disk quota and accounting ==
==== Group quotas ====
'''[[Data_(Hexagon)#Shared_storage:_.2Fshared_.28upon_request.29|/shared]]''' has 10TB hard limit quota enforced per group.


By default you get a soft quota and a hard quota for your home directory. If the soft quota is exceeded for more than 7 days, or the hard quota is exceeded, you will not be able to create or append to files in your home directory.
You can check your disk usage on the /shared file system with the command:
 
  lfs quota -g my_group /shared
Additionally, the file system used by the batch-system has quota enabled, you will see it in the quota output as:
192.168.0.1:/snv
Intermediate files with STDOUT and STDERR of running jobs are placed on that file system. If a job has a lot of output to STDOUT/STDERR, it is recommended to redirect it in the script directly to /work filesystem (i.e.: ''aprun .... executable >& /work/$USER/somelog.file'') instead of using the PBS -o and PBS -e switches. See [[Job execution (Hexagon)#Relevant examples]].
 
=== List quota and usage per user ===
You can check your disk usage (in KB), soft quota (quota) and hard quota (limit) with the command:
  quota
 
=== Default quota per user ===
Each user has:
On '''[[Data (Hexagon)#User area (home_directories): /home|/home]]'''  10GB hard limited quota. And on '''[[Data (Hexagon)#Disk quota and accounting|PBS filesystem]]'''  3GB soft and 4GB hard limits.


=== Request increased quota (justification) ===
=== Request increased quota (justification) ===
Quota on '''[[Data (Hexagon)#User area (home_directories): /home|/home]]''' can normally not be increased. This limitation is due to the fact that '''[[Data (Hexagon)#User area (home_directories): /home|/home]]''' is limited in space.
It is possible for users that have a strong demand for disk space to contact [[Support]] with request to increase quota. Depending on the requirements a solution on another file system may be offered.
 
Despite of this, it is possible for users that have a strong demand for disk space to contact [[Support]] with request to increase quota. Depending on the requirements a solution on another file system may be offered.


== Management of large files ==
== Management of large files ==
For files located on Lustre FS, like '''[[Data (Hexagon)#Work area (temporary data): /work/users |/work]]''', depending on client access pattern, you may want to change striping. By doing this you will optimally load OSTs and can get best throughput.
For files located on a Lustre file system, like '''[[Data (Hexagon)#Work area (temporary data): /work/users |/work]]''', depending on client access pattern, you may want to change striping. By doing this you will optimally load OSTs and can get best throughput.


For large files you will better increase stripe count (and perhaps stripe-chunk size):
For large files you will better increase stripe count (and perhaps stripe-chunk size):
  lfs setstripe --size XM --count Y "dir"
  lfs setstripe --stripe-size XM --stripe-count Y "dir"
e.g.
e.g.
  lfs setstripe --size 8M --count 4 "dir" # stripe across 4 OSTs using 8MB chunks.
  lfs setstripe --stripe-size 8M --stripe-count 4 "dir" # stripe across 4 OSTs using 8MB chunks.


Note that the striping will only take affect on new files created/copied into the directory.
Note that the striping will only take affect on new files created/copied into the directory.
Line 102: Line 111:
== Management of many small files ==
== Management of many small files ==


For files located on Lustre FS, like '''[[Data (Hexagon)#Work area (temporary data): /work/users |/work]]''', depending on client access pattern, you may want to change striping. By doing this you will optimally load OSTs and can get better throughput.
For files located on a Lustre file system, like '''[[Data (Hexagon)#Work area (temporary data): /work/users |/work]]''', depending on client access pattern, you may want to change striping. By doing this you will optimally load OSTs and can get better throughput.


For many small files and one client accessing each file change stripe count to 1
For many small files and one client accessing each file, change stripe count to 1
  lfs setstripe --count 1 "dir"
  lfs setstripe --stripe-count 1 "dir"


Note that the striping will only take affect on new files created/copied into the directory.
Note that the striping will only take affect on new files created/copied into the directory.
Line 117: Line 126:


=== pcp ===
=== pcp ===
It is using MPI. Only /work and /work-common are supported.
'''pcp''' is using MPI. Only /work and /shared are supported.


Use normal job script [[Job execution (Hexagon)#Create a job (scripts)]]
Use normal job script [[Job execution (Hexagon)#Create a job (scripts)]] to spread copying over few nodes.
, you can spread copying over few nodes.


Example:
Example:
  #PBS -l mppwidth=16,mppnppn=8
  #SBATCH --ntasks=16
  # 16 MPI processes, 2 on each node
#SBATCH --ntasks-per-node=8
  # 16 MPI processes, 8 on each node
  # you will be charged for 64 cores
  # you will be charged for 64 cores
  ...
  ...
  module load pcp
  module load pcp
  cd /work/$USER
  cd /work/users/$USER
  aprun -B pcp my-destination-folder my-target-path
  aprun -B pcp source-dir destination-dir


If copying is a part of the computation script, then leave PBS directives as the job already has and just use "aprun -B pcp ..."
If copying is a part of the computation script, then leave PBS directives as the job already has and just use "aprun -B pcp ..."
Line 140: Line 149:
Usually [[Data_(Hexagon)#pcp|PCP]] is faster than Mutil, especially with many small files, and we strongly recommend in the job scripts to use pcp with aprun. There could be some situations when Mutil can be preferred, like copying on the login node with threads, you can try both and see what better suits your task. The --mpi version is less stable than pcp and you can get some "DEADLOCK"s
Usually [[Data_(Hexagon)#pcp|PCP]] is faster than Mutil, especially with many small files, and we strongly recommend in the job scripts to use pcp with aprun. There could be some situations when Mutil can be preferred, like copying on the login node with threads, you can try both and see what better suits your task. The --mpi version is less stable than pcp and you can get some "DEADLOCK"s


"mcp --threads=" - works on any filesystem and can be used on login node.
"mcp --threads=" - works on any file system and can be used on login node.


Mutil has 2 commands:
Mutil has 2 commands:
Line 177: Line 186:
== Compression of data ==
== Compression of data ==
Infrequently accessed files must be compressed to reduce file system usage.  
Infrequently accessed files must be compressed to reduce file system usage.  
This is very important for files targeting tape filesystems, like /migrate and /bcmhsm. These files should be packed together with tar in files larger than 2GB if possible. Please read /bcmhsm/README or /migrate/README before starting to use these file systems.


=== Parallel tools (pigz, pbzip2, ..) ===
=== Parallel tools (pigz, pbzip2, ..) ===
Line 184: Line 191:
Please see the job script example on how they can be used:
Please see the job script example on how they can be used:
  #!/bin/bash
  #!/bin/bash
  #PBS -l walltime=01:00:00,mppwidth=1,mppdepth=32  
  #SBATCH --ntasks=32
  #PBS -A CPUaccount
  export OMP_NUM_THREADS=32
  # load pigz and pbzip2  
  # load pigz and pbzip2  
  module load pigz pbzip2   
  module load pigz pbzip2   
  # create tar file using pigz or bzip2  
  # create tar file using pigz or bzip2  
  cd /work/$USER
  cd /work/users/$USER
  aprun -d 32 tar --use-compress-program pigz -cf tmp.tgz tmp  # example for parallel gzip
  aprun -n1 -N1 -m30000M -d32 tar --use-compress-program pigz -cf tmp.tgz tmp  # example for parallel gzip
  aprun -d 32 tar --use-compress-program pbzip2 -cf tmp.tbz tmp  # example for parallel bzip2
  aprun -n1 -N1 -m30000M -d32 tar --use-compress-program pbzip2 -cf tmp.tbz tmp  # example for parallel bzip2


=== Tools (gzip, bzip2, ..) ===
=== Tools (gzip, bzip2, ..) ===
Line 201: Line 206:
If you need to perform packing/unpacking/compressing on compute nodes (recommended for very big files), please load module coreutils-cnl. E.g.:
If you need to perform packing/unpacking/compressing on compute nodes (recommended for very big files), please load module coreutils-cnl. E.g.:
  module load coreutils-cnl
  module load coreutils-cnl
  cd /work/$USER
  cd /work/users/$USER
  aprun -n 1 tar -cf archive.tar MYDIR
  aprun -n 1 tar -cf archive.tar MYDIR


Line 214: Line 219:


== Back-up of data ==
== Back-up of data ==
Our systems are connected to a secondary storage device (tape robot) with more than 1000 Terabyte of tapes. The tape robot is used for the storage of backup data and archiving. The tape robot is generally available for users of the fimm cluster and the Cray XT4 hexagon.  
Hexagon is connected to a secondary storage device (tape robot). The tape robot is used for the storage of backup data and archiving.


=== Back-up policies ===
For backup policies, please consult the [[Data_Handling_and_Storage_Policy#Back-up_policy|Data handling and storage policy]] document.
Incremental backups (only modified files) of user home directories (/home) are made every night. All versions of a file for the last 90 days are available from backup, a deleted file remains in backup for 365 days before it expires.


The following files are '''excluded''' from backup:
Should you need to restore files from backup, please contact [[Support]].
* contents of directories (and all their subdirectories) named tmp, TMP, temp, TEMP, scratch, SCRATCH or Scratch. Do not put any valuable data in such directories.


Backups are NOT done for directories used for temporary storage (e.g. /work or /scratch file systems).  
== Closing of user account ==
Closing of UiB user accounts is controlled by the user's department HR manager.


=== How to restore files from back-up ===
In case of IMR or an organization that is affiliated (or has a contract for usage of the machine) with UiB, the project leader has to inform [[Support]] immediately as soon as the contract termination/suspention date is known for the specific user.
Use the "dsmc" command to restore files from backup. The command can be used by all users (not only root) provided you have read and write permissions to the files that you want to restore and the location you want to restore to.


For the Cray XT4 (hexagon), only files in /home are backed up.
Account closure implies user access blocking and deletion of all user related data in /home.


To get usage information, execute
== Privacy of user data ==
When user account is created access to his/her home and work folders is set such as they cannot be accessed by other users (mode 700).


dsmc help
User might choose to share his/her scratch area with other users in the group by running:
 
  chmod g+rX /work/users/$USER # work folder access
Retrieving files from backup may take some time (depending on e.g. file size and workload of the tape robot) and can take between a few seconds to several minutes.
 
=== Restore examples ===
For hexagon, to restore the latest available version of the file /home/plab/eithor/test.tmp, execute:
 
dsmc restore -latest /home/plab/eithor/test.tmp
 
If you need to restore an older version of a file use e.g.:
 
dsmc restore -inac -pick /home/plab/eithor/test.tmp
 
Select the version you want to restore using the on-screen instructions.
 
Multiple files can be restored using wildcards:
 
dsmc restore '/home/plab/eithor/test.*'
 
Restoring files to a different location can be done by specifying a restore point path:
 
  dsmc restore '/home/plab/eithor/test.tmp' '/work/eithor/'
 
Multiple files can be restored to a restore point, by using wildcards. E.g:


dsmc restore '/home/plab/eithor/*' '/work/eithor/'
User home directory must not be shared with anybody and permissions are regularly checked/adjusted. For more information, see the [[Data_Handling_and_Storage_Policy|Data handling and storage policy]] document.


'''Note that all references to directory names MUST end with a forward-slash:  /'''
The project responsible can request via [[Support]] to create additional UNIX group. This can be useful if some data on /work or /shared has to be shared between several users.
 
=== Important options to dsmc restore ===
{|
| -sub=yes || can be used to restore a whole file tree
|-
| -pick || gives interactive mode to select which files to restore
|-
| -inac || select from older versions of files
|-
| -todate=DD.MM.YYYY || select newest version of files up to DD.MM.YYYY
|-
| -fromdate=DD.MM.YYYY || select newest version of files from DD.MM.YYYY
|}
A complete list of options can be found under:
 
dsmc help
 
Enter the number of the desired help topic or 'q' to quit, 'd' to scroll down, 'u' to scroll up.
 
'''Problems restoring files?''' Send a problem report to [[Support]].
 
== Archiving data ==
Hexagon does not currently have any archiving facility for common use.
 
Previously, it was possible for users to apply for long term storage space in /migrate. This service has been suspended until new funding sources can be found.
 
Users from the Bjerknes center have archiving space in /bcmhsm. Access can be requested at [[Support]].
 
Other users are advised to apply for Norstore resources http://www.norstore.no/.
 
If you have a specific demand please contact us at [[Support]]
 
=== How to archive data ===
To archive data on /bcmhsm or /migrate file systems, simply use cp or mv commands. It is important to remember that these file systems are tape file systems and to get the most out of this file system,it is important that it's used correctly:
* Never use /migrate/$USER as a work-area.
* Use /work and /scratch for temporary data.
* Copy files to /migrate '''after''' the job is finished, that is, avoid appending to files in /migrate from a running job.
* Never unpack files under /migrate/$USER
* Instead copy the files to /work first.
* Never put small (<100MB) files here, try to make the files larger than 2 GB if possible.
* Instead, pack directories with small files together with tar and, if possible, compress with gzip or bzip2.
* Never use tools like cat, grep, etc.. directly on files under /migrate/$USER it will fetch all referenced files from tape.
* Instead copy the files needed to /work first.
 
=== How to retrieve data from archive ===
To retrieve data from /migrate or /bcmhsm, simply use "cp" or "mv" commands, to remove files use "rm".
 
== Closing of user account ==
A temporary (end date for usage is set) user account will be closed 1 week after the day noted in the request/application approval.
 
A normal user account can be closed on the request of NOTUR project lead Uninett Sigma, if the user no longer works at UiB, IMR or an organization that is affiliated (or has a contract for usage of the machine) with UiB.
 
When the user account is closed, the home folder belonging to the user is archived on tape for a period of 6 months. Files from other file systems, e.g. /work, are deleted. On a request to [[Support]] the home folder data can be restored by the system administrators.
 
== Privacy of user data ==
When user account is created access to his home and work folders is set so they cannot be accessed by other users (mode 700). If you want to change access of your home or work folder to make it readable by your group, try:
chmod g+rX $HOME # home folder access
chmod g+rX /work/users/$USER # work folder access
The project responsible can request via [[Support]] to create additional UNIX group. This can be useful if data in the home folder or else must be shared between several users.


== References ==
== References ==
<references />
<references />
[[Category:Hexagon]]

Latest revision as of 14:24, 22 January 2018

Available file systems

Available file systems on Hexagon are:

  • /home
  • /work
  • /shared

For details about each storage area, see documentation down below.

Note: Hexagon nodes do not have internal disks, thus there is no local /scratch file system available on any of the nodes.

Data handling and storage policy can be found here.

User area (home directories): /home

/home is the file system for user home directories ($HOME) on Hexagon.

This file system is relatively small and not mounted on the compute nodes, so it can NOT be used for running jobs. It has quota enabled, limits can be found here . Files are backed up daily, except for folders called "scratch" or "tmp" and their sub-folders.

Work area (temporary data): /work/users

/work is a large external storage shared by all compute nodes on hexagon. Files are NOT backed up. /work is a Lustre file system where the user can specify stripe size or stripe count to optimize performance (see examples in Data (Hexagon)#Management of large files).

Only /work and /shared should be used when running jobs since there is no /home or any local scratch file system available on the compute nodes. Overview and comparison between /work and /shared can be found here.

Note: The /work/users/* directories are subject to automatic deletion dependent on modification, access time and the total usage of the file system. The oldest files will be deleted first. You can find more information about deletion policy in the Data handling and storage policy document.

Shared storage: /shared (upon request)

/shared is a Lustre file system available on the compute nodes. This file system is intended for projects and other shared data between users and projects.

/shared is a "permanent" storage. It is not subject to automatic deletions (like /work/users), but it has no backup. For comparison between /work and /shared see here.

The maximum space a project can request is 10TB. There is an option to purchase extra storage nodes by the project or group to have this limit extended. Please contact us for more details. The groups on this filesystem are enforced according to project quota allocations by nightly cron job.

What area to use for what data

/home should be used for storing tools, like application sources, scripts, or any relevant data which must have a backup.

/work/users should be used for running jobs, as a main storage during data processing. All data after processing must be moved out of the machine or deleted after use.

/shared should be used for sharing data between projects, permanent files and running jobs.

Policies wrt deletion of temporary data

Please find detailed policies here.

Overview between /work and /shared

Feature /work /shared
Available on the compute nodes Y Y
Automatic deletion Y N
Backup N N
Group enforcement N Y
Performance Best Slightly worse
Quota N Y

Transferring data to/from the system

Only ssh type of access is open to hexagon. Therefore to upload or download data only scp and sftp can be used. On special request it is possible to bring hard drives to the data center and have the data copied directly by the support staff. Please contact Support to arrange this.

To transfer data to and from hexagon use the address:

hexagon.hpc.uib.no

All login nodes on hexagon now has 10Gb network interfaces.

Basic tools (scp, sftp)

Standard scp command and sftp clients can be used.

Please have a look at Getting started section for list of programs and examples.

Disk quota and accounting

Default quotas

User quotas

By default you get a soft quota and a hard quota for your home directory.

/home has 20GB hard limit and maximum 1M files quota enforced per user. Grace time is set to 7 days. If the soft quota is exceeded for more than 7 days, or the hard quota is exceeded, you will not be able to create new files or append to files in your home directory.

You can check your disk usage (in KB), soft quota (quota), hard quota (limit) and inodes (limit) with the command:

quota -Q

OR in human readable form with:

quota -Qs

Note: Intermediate files with STDOUT and STDERR of running jobs are placed on the /home file system. If a job has a lot of output to STDOUT/STDERR, it is recommended to redirect it in the script directly to /work filesystem (i.e.: aprun .... executable >& /work/users/$USER/somelog.file) instead of using the "#SBATCH --output" switch. See Job execution (Hexagon)#Relevant examples.

Group quotas

/shared has 10TB hard limit quota enforced per group.

You can check your disk usage on the /shared file system with the command:

lfs quota -g my_group /shared

Request increased quota (justification)

It is possible for users that have a strong demand for disk space to contact Support with request to increase quota. Depending on the requirements a solution on another file system may be offered.

Management of large files

For files located on a Lustre file system, like /work, depending on client access pattern, you may want to change striping. By doing this you will optimally load OSTs and can get best throughput.

For large files you will better increase stripe count (and perhaps stripe-chunk size):

lfs setstripe --stripe-size XM --stripe-count Y "dir"

e.g.

lfs setstripe --stripe-size 8M --stripe-count 4 "dir" # stripe across 4 OSTs using 8MB chunks.

Note that the striping will only take affect on new files created/copied into the directory.

See more in Lustre.

Management of many small files

For files located on a Lustre file system, like /work, depending on client access pattern, you may want to change striping. By doing this you will optimally load OSTs and can get better throughput.

For many small files and one client accessing each file, change stripe count to 1

lfs setstripe --stripe-count 1 "dir"

Note that the striping will only take affect on new files created/copied into the directory.

See more in Lustre.

Copying files in parallel

Normally you will copy files on the login nodes, but there are cases when you need to copy big amount of data from one parallel file system to another one.

In this case we recommend you to use special tools optimized for parallel copying. In general, any big copying operation inside the job script will benefit. For copying on compute nodes only /work and /work-common are available and we recommend pcp.

pcp

pcp is using MPI. Only /work and /shared are supported.

Use normal job script Job execution (Hexagon)#Create a job (scripts) to spread copying over few nodes.

Example:

#SBATCH --ntasks=16
#SBATCH --ntasks-per-node=8
# 16 MPI processes, 8 on each node
# you will be charged for 64 cores
...
module load pcp
cd /work/users/$USER
aprun -B pcp source-dir destination-dir

If copying is a part of the computation script, then leave PBS directives as the job already has and just use "aprun -B pcp ..."

PCP project page

mutil

Mutil can use MPI and threads.

Usually PCP is faster than Mutil, especially with many small files, and we strongly recommend in the job scripts to use pcp with aprun. There could be some situations when Mutil can be preferred, like copying on the login node with threads, you can try both and see what better suits your task. The --mpi version is less stable than pcp and you can get some "DEADLOCK"s

"mcp --threads=" - works on any file system and can be used on login node.

Mutil has 2 commands:

* mcp - parallel copy command with the same syntax as "cp"
* msum - parallel "md5sum"

"mcp" has the same syntax as "cp", in addition it has the following options:

     --buffer-size=MBYTES     read/write buffer size [4]
     --direct-read            enable use of direct I/O for reads
     --direct-write           enable use of direct I/O for writes
     --double-buffer          enable use of double buffering during file I/O
     --fadvise-read           enable use of posix_fadvise during reads
     --fadvise-write          enable use of posix_fadvise during writes
     --length=LEN             copy LEN bytes beginning at --offset
                                (or 0 if --offset not specified)
     --mpi                    enable use of MPI for multi-node copies
     --offset=POS             copy --length bytes beginning at POS
                                (or to end if --length not specified)
     --print-stats            print performance per file to stderr
     --print-stripe           print striping changes to stderr
     --read-stdin             perform a batch of operations read over stdin
                                in the form 'SRC DST RANGES' where SRC and DST
                                must be URI-escaped (RFC 3986) file names and
                                RANGES is zero or more comma-separated ranges of
                                the form 'START-END' for 0 <= START < END
     --skip-chmod             retain temporary permissions used during copy
     --split-size=MBYTES      size to split files for parallelization [0]
     --threads=NUMBER         number of OpenMP worker threads to use [4]

So copying example on the login node with threads can look like:

module load mutil
mcp --threads=4 --print-stats -a source-folder destination-path

Mutil project

Compression of data

Infrequently accessed files must be compressed to reduce file system usage.

Parallel tools (pigz, pbzip2, ..)

Hexagon has threaded versions of gzip and bzip2. They have almost linear scaling inside one node. Please see the job script example on how they can be used:

#!/bin/bash
#SBATCH --ntasks=32
export OMP_NUM_THREADS=32 
# load pigz and pbzip2 
module load pigz pbzip2  
# create tar file using pigz or bzip2 
cd /work/users/$USER
aprun -n1 -N1 -m30000M -d32 tar --use-compress-program pigz -cf tmp.tgz tmp  # example for parallel gzip
aprun -n1 -N1 -m30000M -d32 tar --use-compress-program pbzip2 -cf tmp.tbz tmp  # example for parallel bzip2

Tools (gzip, bzip2, ..)

Tools like gzip, bzip2, zip and unrar are in the PATH and are always available on login nodes. Use command man to get detailed info.

man bzip2

If you need to perform packing/unpacking/compressing on compute nodes (recommended for very big files), please load module coreutils-cnl. E.g.:

module load coreutils-cnl
cd /work/users/$USER
aprun -n 1 tar -cf archive.tar MYDIR

Binary data (endianness)

Hexagon is an AMD Opteron based machine. Therefore it has little-endian format. [1]

Fortran sequential unformatted files created on big-endian machines cannot be read on a little-endian system. To workaround this issue, you can recompile your Fortran code with:

  • -byteswapio - for PGI compiler
  • -fconvert=swap - for GNU fortran

Back-up of data

Hexagon is connected to a secondary storage device (tape robot). The tape robot is used for the storage of backup data and archiving.

For backup policies, please consult the Data handling and storage policy document.

Should you need to restore files from backup, please contact Support.

Closing of user account

Closing of UiB user accounts is controlled by the user's department HR manager.

In case of IMR or an organization that is affiliated (or has a contract for usage of the machine) with UiB, the project leader has to inform Support immediately as soon as the contract termination/suspention date is known for the specific user.

Account closure implies user access blocking and deletion of all user related data in /home.

Privacy of user data

When user account is created access to his/her home and work folders is set such as they cannot be accessed by other users (mode 700).

User might choose to share his/her scratch area with other users in the group by running:

chmod g+rX /work/users/$USER # work folder access

User home directory must not be shared with anybody and permissions are regularly checked/adjusted. For more information, see the Data handling and storage policy document.

The project responsible can request via Support to create additional UNIX group. This can be useful if some data on /work or /shared has to be shared between several users.

References

  1. Endianness, endianness on Wikipedia.