Child pages
  • ROGER's System Hardware
Skip to end of metadata
Go to start of metadata

Overview of ROGER Cluster System Hardware

ROGER is a Dell cluster consisting of a total of 108 Intel Xeon E5-2660v3 processors with a total of 13.3 TB of system memory available for computation.  Each Xeon E5 v3 processor is capable of performing up to 416 Gflops1, yielding a peak performance of 44.9 Tflops. In addition to CPUs, there are 12 Nvidia Tesla K40M graphics units available for tasks that require GPU's.  Each GPU is capable of performing at up to 1.68 Tflops2, bringing the total cluster-wide performance to ~65 Tflops (theoretical).  The cluster is connected by a high-speed network with 40Gb/s switches in the core and 10Gb/s uplinks to each node.

1 2.6GHz * 10 cores * 108 processors * 16 double-precision floating point operations per second with AVX2
2 see  table» , max double precision.

Compute Nodes  

ROGER is compromised of three distinct compute nodes: traditional compute nodes, high memory compute nodes, and graphics nodes. Continue reading for a complete description. 

Batch Compute Nodes

The compute nodes are Dell Power Edge R730 servers with two Intel Xeon E5-2660 v3 chips.  Each chip has 10 cores, each running at 2.6 GHz, and a 25 Mb cache.  Each server has 256 Gb of physical RAM and 500 Gb of local storage for swap and scratch space.  There are 24 batch compute nodes. Their node names are in the format  cg-cmpXX  (where XX is the node number).

High Memory Nodes

The 16 High Memory compute nodes are identical to the traditional compute nodes, except for for being equipped with 800 Gb of local storage using Solid State Drives (SSDs). They are called "high memory" because before the upgrade of April 2016, they were the only nodes with 256GB of RAM. These nodes were designed to run Hadoop workloads, and can sustain an 11 Tb Hadoop filesystem with their SSDs. However, the production Hadoop system is currently using the GPFS shared filesystem for storage, which still has excellent performance and allows much greater size: currently 175 Tb is allocated. Their node names are in the format  cg-hmXX  (where XX is the node number).

Graphics Processing Unit (GPU) Nodes

The GPU compute nodes are identical to the traditional compute nodes with the addition of an Nvidia Tesla K40M GPU. Their node names are in the format  cg-gpuXX  (where XX is the node number).

Compute Hardware Breakdown

Node Type
Node Qty
CPU
CPU Qty
Total Cores
RAM
Connectivity
Storage
Compute 1 29Intel Xeon E5-2660 v3
2.6GHz
25M Cache
10 Cores
220256GB(1x) 10Gb/s500GB
7.2K rpm
Nearline SAS 6Gb/s 
GPU 2 12(same)220256GB(1x) 10Gb/s500GB 
7.2K rpm
Nearline SAS 6Gb/s
SSD Nodes 3 16(same)220256 GB(1x) 10Gb/s800GB SSD 
SAS 6Gb/s 
GridFTP2(same)220256 GB(1x) 40Gb/s800GB SSD 
SAS 6Gb/s 

 

Icon

1 21 nodes are assigned to batch system and 8 nodes are currently assigned to OpenStack to serve as hypervisors.
2 1 node is reserved as login node and 11 nodes are used for batch computation
3  11 nodes are assigned for Hadoop use while 5 of these nodes serve as Openstack hypervisors.

 

 

Service Nodes

In addition to the computing resources, the cluster also has 10 General Parallel File System (GPFS) server nodes, 2 high memory service nodes, and 1 administration node.

Of the GPFS server nodes, 8 are identical to the high memory nodes except for local storage, where the single SSD is replaced by 3 600GB 15K rpm Serial Attached SCSI (SAS) drives for OS and filesystem use.  The remaining 2 have an additional 6 SSD drives that connect via an internal 12Gb/s SAS connection, allowing extremely fast filesystem metadata access.

 

The 2 high memory service nodes have a 40Gb/s network connection and provide GridFTP services for data transfer into/out of the cluster.  These nodes will also run various virtual machines (VM's), such as the user login node(s) and other cluster services.

Filesystem

All of ROGER's nodes have access to the cluster-wide filesystems. The filesystems are built using the General Parallel File System (GPFS) software from IBM and backed by NetApp E2700 storage units, which each have 180 SATA drives. The total usable disk space is 4.5PB.

Storage Hardware

   

Node
Node Qty
CPU
CPU Qty
Total Cores
RAM
Disk
Disk Qty
GPFS server6Intel Xeon E5-2660 v3 
2.6GHz 
25M Cache 
10 Cores
220256 GB600GB 15K RPM SAS 6Gbps 2.5in3
GPFS server + CES2(same)220256GB(same as above)3
GPFS server + meta-data2(same)220256 GB(same as above) +
800 GB SSD SAS 12Gb/s 
(3) +
Netapp E27001n/an/an/an/a4TB SATA180
Netapp E27005n/an/an/an/a6TB SATA180

 

OS Management Hardware

 

Node

Node Qty

CPU

CPU Qty

Total Cores

RAM

Disk

Disk Qty

Disk

Disk Qty

Admin Node1Intel Xeon E5-2407 v2
2.40GHz,
10M Cache
4 Cores
1412 GB1 TB2 (Raid Mirror)300 GB2 (Raid Mirror)

Openstack management

(Network and control nodes)

2(same)1412 GB600GB2 (Raid Miror)  


On this page:

Browse Content

 

 

  • No labels